Re: [PATCH v2] Deprecate the "-no-acpi" command line switch
On Fri, 24 Feb 2023 10:05:43 +0100 Thomas Huth wrote: > Similar to "-no-hpet", the "-no-acpi" switch is a legacy command > line option that should be replaced with the "acpi" machine parameter > nowadays. > > Signed-off-by: Thomas Huth Reviewed-by: Igor Mammedov > --- > v2: Fixed stypid copy-n-paste bug (Thanks to Sunil for spotting it!) > > docs/about/deprecated.rst | 6 ++ > softmmu/vl.c | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst > index ee95bcb1a6..15084f7bea 100644 > --- a/docs/about/deprecated.rst > +++ b/docs/about/deprecated.rst > @@ -99,6 +99,12 @@ form is preferred. > The HPET setting has been turned into a machine property. > Use ``-machine hpet=off`` instead. > > +``-no-acpi`` (since 8.0) > +'''''''''''''''''''''''' > + > +The ``-no-acpi`` setting has been turned into a machine property. > +Use ``-machine acpi=off`` instead. > + > ``-accel hax`` (since 8.0) > '''''''''''''''''''''''''' > > diff --git a/softmmu/vl.c b/softmmu/vl.c > index 459588aa7d..a3c59b5462 100644 > --- a/softmmu/vl.c > +++ b/softmmu/vl.c > @@ -3271,6 +3271,7 @@ void qemu_init(int argc, char **argv) > vnc_parse(optarg); > break; > case QEMU_OPTION_no_acpi: > +warn_report("-no-acpi is deprecated, use '-machine acpi=off' > instead"); > qdict_put_str(machine_opts_dict, "acpi", "off"); > break; > case QEMU_OPTION_no_hpet:
Re: [PATCH] tests: qemucapabilities: Update qemu caps dump for the qemu-7.0.0 release on x86_64
On Wed, 20 Apr 2022 14:13:00 +0200 Peter Krempa wrote: > On Wed, Apr 20, 2022 at 14:00:52 +0200, Igor Mammedov wrote: > > On Wed, 20 Apr 2022 12:21:03 +0100 > > Daniel P. Berrangé wrote: > > > > > On Wed, Apr 20, 2022 at 01:15:43PM +0200, Igor Mammedov wrote: > > > > On Wed, 20 Apr 2022 13:02:12 +0200 > > > > Peter Krempa wrote: > > > > > > > > > Few minor changes in qemu since the last update: > > > > > - PIIX4_PM gained 'x-not-migrate-acpi-index' property > > > > > > > > do you do this for just for every new property? > > > > (nothing outside of QEMU needs to know about x-not-migrate-acpi-index, > > > > unless one is interested in whether it works or not) > > > > > > This is simply a record of what QEMU reports when you query properties > > > for the devices libvirt cares about. > > > > I was just curious why libvirt does it. > > > > > If nothing outside is supposed to > > > know about x-not-migrate-acpi-index then QEMU shouldn't tell us about > > > it when asked for properties :-) > > > > Does livirt uses/exposes x- prefixed property anywhere? > > (i.e. can QEMU hide them?) > > I don't think it's needed to hide them. In fact we have strong rules > against using them. > > With one notable exception: > > -object memory-backend-file,x-use-canonical-path-for-ramblock-id= > > But this was discussed extensively on the qemu list and qemu pledges > that this specific property is considered stable. ok lets leave it as is.
Re: [PATCH] tests: qemucapabilities: Update qemu caps dump for the qemu-7.0.0 release on x86_64
On Wed, 20 Apr 2022 12:21:03 +0100 Daniel P. Berrangé wrote: > On Wed, Apr 20, 2022 at 01:15:43PM +0200, Igor Mammedov wrote: > > On Wed, 20 Apr 2022 13:02:12 +0200 > > Peter Krempa wrote: > > > > > Few minor changes in qemu since the last update: > > > - PIIX4_PM gained 'x-not-migrate-acpi-index' property > > > > do you do this for just for every new property? > > (nothing outside of QEMU needs to know about x-not-migrate-acpi-index, > > unless one is interested in whether it works or not) > > This is simply a record of what QEMU reports when you query properties > for the devices libvirt cares about. I was just curious why libvirt does it. > If nothing outside is supposed to > know about x-not-migrate-acpi-index then QEMU shouldn't tell us about > it when asked for properties :-) Does livirt uses/exposes x- prefixed property anywhere? (i.e. can QEMU hide them?) > With regards, > Daniel
Re: [PATCH] tests: qemucapabilities: Update qemu caps dump for the qemu-7.0.0 release on x86_64
On Wed, 20 Apr 2022 13:02:12 +0200 Peter Krempa wrote: > Few minor changes in qemu since the last update: > - PIIX4_PM gained 'x-not-migrate-acpi-index' property do you do this for just for every new property? (nothing outside of QEMU needs to know about x-not-migrate-acpi-index, unless one is interested in whether it works or not) > - 'cocoa' display and corresponding props (not present in this build) > > Changes in build: > - dbus display driver re-enabled > - gtk display support re-disabled > - xen support re-disabled > > Signed-off-by: Peter Krempa > --- > .../caps_7.0.0.x86_64.replies | 583 -- > .../caps_7.0.0.x86_64.xml | 10 +- > 2 files changed, 257 insertions(+), 336 deletions(-) > > diff --git a/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.replies > b/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.replies > index d1f453dcca..620442704a 100644 > --- a/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.replies > +++ b/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.replies > @@ -17,11 +17,11 @@ > { >"return": { > "qemu": { > - "micro": 92, > - "minor": 2, > - "major": 6 > + "micro": 0, > + "minor": 0, > + "major": 7 > }, > -"package": "v7.0.0-rc2" > +"package": "v7.0.0" >}, >"id": "libvirt-2" > } > @@ -5119,10 +5119,6 @@ >"name": "135", >"tag": "type", >"variants": [ > -{ > - "case": "gtk", > - "type": "358" > -}, > { >"case": "curses", >"type": "360" > @@ -5131,6 +5127,10 @@ >"case": "egl-headless", >"type": "361" > }, > +{ > + "case": "dbus", > + "type": "362" > +}, > { >"case": "default", >"type": "0" > @@ -10498,6 +10498,10 @@ >"case": "qemu-vdagent", >"type": "518" > }, > +{ > + "case": "dbus", > + "type": "519" > +}, > { >"case": "vc", >"type": "520" > @@ -11756,9 +11760,6 @@ > { >"name": "none" > }, > -{ > - "name": "gtk" > -}, > { >"name": "sdl" > }, > @@ -11770,17 +11771,20 @@ > }, > { >"name": "spice-app" > +}, > +{ > + "name": "dbus" > } >], >"meta-type": "enum", >"values": [ > "default", > "none", > -"gtk", > "sdl", > "egl-headless", > "curses", > -"spice-app" > +"spice-app", > +"dbus" >] > }, > { > @@ -16067,6 +16071,9 @@ > { >"name": "qemu-vdagent" > }, > +{ > + "name": "dbus" > +}, > { >"name": "vc" > }, > @@ -16097,6 +16104,7 @@ > "spicevmc", > "spiceport", > "qemu-vdagent", > +"dbus", > "vc", > "ringbuf", > "memory" > @@ -16202,6 +16210,16 @@ >], >"meta-type": "object" > }, > +{ > + "name": "519", > + "members": [ > +{ > + "name": "data", > + "type": "618" > +} > + ], > + "meta-type": "object" > +}, > { >"name": "520", >"members": [ > @@ -18460,6 +18478,26 @@ >], >"meta-type": "object" > }, > +{ > + "name": "618", > + "members": [ > +{ > + "name": "logfile", > + "default": null, > + "type": "str" > +}, > +{ > + "name": "logappend", > + "default": null, > + "type": "bool" > +}, > +{ > + "name": "name", > + "type": "str" > +} > + ], > + "meta-type": "object" > +}, > { >"name": "619", >"members": [ > @@ -20363,10 +20401,6 @@ >"name": "acpi-erst", >"parent": "pci-device" > }, > -{ > - "name": "virtio-crypto-device", > - "parent": "virtio-device" > -}, > { >"name": "isa-applesmc", >"parent": "isa-device" > @@ -20379,49 +20413,53 @@ >"name": "vhost-user-input-pci", >"parent": "vhost-user-input-pci-base-type" > }, > +{ > + "name": "usb-redir", > + "parent": "usb-device" > +}, > { >"name": "floppy-bus", >"parent": "bus" > }, > { > - "name": "Denverton-x86_64-cpu", > - "parent": "x86_64-cpu" > + "name": "virtio-crypto-device", > + "parent": "virtio-device" > }, > { >"name": "chardev-testdev", >"parent": "chardev" > }, > { > - "name": "usb-wacom-tablet", > - "parent": "usb-device" > + "name": "Denverton-x86_64-cpu", > + "parent": "x86_64-cpu" > }, > { > - "name"
Re: [PATCH v4 2/4] hw/i386: Attach CPUs to machine
On Mon, 7 Feb 2022 11:48:27 + Daniel P. Berrangé wrote: > On Mon, Feb 07, 2022 at 12:22:22PM +0100, Igor Mammedov wrote: > > On Mon, 7 Feb 2022 10:36:42 +0100 > > Peter Krempa wrote: > > > > > On Mon, Feb 07, 2022 at 10:18:43 +0100, Igor Mammedov wrote: > > > > On Mon, 7 Feb 2022 09:14:37 +0100 > > > > Igor Mammedov wrote: > > > > > > > > > On Sat, 5 Feb 2022 13:45:24 +0100 > > > > > Philippe Mathieu-Daudé wrote: > > > > > > > > > > > Previously CPUs were exposed in the QOM tree at a path > > > > > > > > > > > > /machine/unattached/device[nn] > > > > > > > > > > > > where the 'nn' of the first CPU is usually zero, but can > > > > > > vary depending on what devices were already created. > > > > > > > > > > > > With this change the CPUs are now at > > > > > > > > > > > > /machine/cpu[nn] > > > > > > > > > > > > where the 'nn' of the first CPU is always zero. > > > > > > > > > > Could you add to commit message the reason behind the change? > > > > > > > > regardless, it looks like unwarranted movement to me > > > > prompted by livirt accessing/expecting a QOM patch which is > > > > not stable ABI. I'd rather get it fixed on libvirt side. > > > > > > > > If libvirt needs for some reason access a CPU instance, > > > > it should use @query-hotpluggable-cpus to get a list of CPUs > > > > (which includes QOM path of already present CPUs) instead of > > > > hard-codding some 'well-known' path as there is no any guarantee > > > > that it will stay stable whatsoever. > > > > > > I don't disagree with you about the use of hardcoded path, but the way > > > of using @query-hotpluggable-cpus is not really aligning well for how > > > it's being used. > > > > > > To shed a bit more light, libvirt uses the following hardcoded path > > > > > > #define QOM_CPU_PATH "/machine/unattached/device[0]" > > > > > > in code which is used to query CPU flags. That code doesn't care at all > > > which cpus are present but wants to get any of them. So yes, calling > > > query-hotpluggable-cpus is possible but a bit pointless. > > > > Even though query-hotpluggable-cpus is cumbersome > > it still lets you avoid hardcodding QOM path and let you > > get away with keeping "_400 QMP calls" probing while > > something better comes along. > > > > > > > In general the code probing cpu flags via qom-get is very cumbersome as > > > it ends up doing ~400 QMP calls at startup of a VM in cases when we deem > > > it necessary to probe the cpu fully. > > > > > > It would be much better (And would sidestep the issue altoghether) if we > > > had a more sane interface to probe all cpu flags in one go, and ideally > > > the argument specifying the cpu being optional. > > > > > > Libvirt can do the adjustment, but for now IMO the path to the first cpu > > > (/machine/unattached/device[0]) became de-facto ABI by the virtue that > > > it was used by libvirt and if I remember correctly it was suggested by > > > the folks dealing with the CPU when the code was added originally. > > I would've argued against that back then as well, > > there weren't any guarantee and I wouldn't like precedent of > > QOM abuse becoming de-facto ABI. > > Note: this patch breaks this so called ABI as well and introduces > > yet another harcodded path without any stability guarantee whatsoever. > > AFAIK, we've never defined anything about QOM paths wrt ABI one way > or the other ? In the absence of guidelines then it comes down to not written in docs anyways (all I have is vague recollection that we really didn't want to make of QOM path/tree an ABI). For more on this topic see the comment at the end. > what are reasonable expectations of the mgmt app. These expectations > will be influenced by what it is actually possible to acheive given > our API as exposed. > > I think it is unreasonable to expect /machine/unattached to be > stable because by its very nature it is just a dumping ground > for anything where the dev hasn't put in any thought to the path > placement. IOW, it was/is definitely a bad idea for libvirt to > r
Re: [PATCH v4 2/4] hw/i386: Attach CPUs to machine
On Mon, 7 Feb 2022 10:36:42 +0100 Peter Krempa wrote: > On Mon, Feb 07, 2022 at 10:18:43 +0100, Igor Mammedov wrote: > > On Mon, 7 Feb 2022 09:14:37 +0100 > > Igor Mammedov wrote: > > > > > On Sat, 5 Feb 2022 13:45:24 +0100 > > > Philippe Mathieu-Daudé wrote: > > > > > > > Previously CPUs were exposed in the QOM tree at a path > > > > > > > > /machine/unattached/device[nn] > > > > > > > > where the 'nn' of the first CPU is usually zero, but can > > > > vary depending on what devices were already created. > > > > > > > > With this change the CPUs are now at > > > > > > > > /machine/cpu[nn] > > > > > > > > where the 'nn' of the first CPU is always zero. > > > > > > Could you add to commit message the reason behind the change? > > > > regardless, it looks like unwarranted movement to me > > prompted by livirt accessing/expecting a QOM patch which is > > not stable ABI. I'd rather get it fixed on libvirt side. > > > > If libvirt needs for some reason access a CPU instance, > > it should use @query-hotpluggable-cpus to get a list of CPUs > > (which includes QOM path of already present CPUs) instead of > > hard-codding some 'well-known' path as there is no any guarantee > > that it will stay stable whatsoever. > > I don't disagree with you about the use of hardcoded path, but the way > of using @query-hotpluggable-cpus is not really aligning well for how > it's being used. > > To shed a bit more light, libvirt uses the following hardcoded path > > #define QOM_CPU_PATH "/machine/unattached/device[0]" > > in code which is used to query CPU flags. That code doesn't care at all > which cpus are present but wants to get any of them. So yes, calling > query-hotpluggable-cpus is possible but a bit pointless. Even though query-hotpluggable-cpus is cumbersome it still lets you avoid hardcodding QOM path and let you get away with keeping "_400 QMP calls" probing while something better comes along. > In general the code probing cpu flags via qom-get is very cumbersome as > it ends up doing ~400 QMP calls at startup of a VM in cases when we deem > it necessary to probe the cpu fully. > > It would be much better (And would sidestep the issue altoghether) if we > had a more sane interface to probe all cpu flags in one go, and ideally > the argument specifying the cpu being optional. > > Libvirt can do the adjustment, but for now IMO the path to the first cpu > (/machine/unattached/device[0]) became de-facto ABI by the virtue that > it was used by libvirt and if I remember correctly it was suggested by > the folks dealing with the CPU when the code was added originally. I would've argued against that back then as well, there weren't any guarantee and I wouldn't like precedent of QOM abuse becoming de-facto ABI. Note: this patch breaks this so called ABI as well and introduces yet another harcodded path without any stability guarantee whatsoever. > > Even if we change it in libvirt right away, changing qemu will break > forward compatibility. While we don't guarantee it, it still creates > user grief. >
Re: [PATCH v4 2/4] hw/i386: Attach CPUs to machine
On Mon, 7 Feb 2022 09:14:37 +0100 Igor Mammedov wrote: > On Sat, 5 Feb 2022 13:45:24 +0100 > Philippe Mathieu-Daudé wrote: > > > Previously CPUs were exposed in the QOM tree at a path > > > > /machine/unattached/device[nn] > > > > where the 'nn' of the first CPU is usually zero, but can > > vary depending on what devices were already created. > > > > With this change the CPUs are now at > > > > /machine/cpu[nn] > > > > where the 'nn' of the first CPU is always zero. > > Could you add to commit message the reason behind the change? regardless, it looks like unwarranted movement to me prompted by livirt accessing/expecting a QOM patch which is not stable ABI. I'd rather get it fixed on libvirt side. If libvirt needs for some reason access a CPU instance, it should use @query-hotpluggable-cpus to get a list of CPUs (which includes QOM path of already present CPUs) instead of hard-codding some 'well-known' path as there is no any guarantee that it will stay stable whatsoever. > > Note: This (intentionally) breaks compatibility with current > > libvirt code that looks for "/machine/unattached/device[0]" > > in the assumption it is the first CPU. > Why libvirt does this in the first place? > > > > Cc: libvir-list@redhat.com > > Suggested-by: Daniel P. Berrangé > > Reviewed-by: Daniel P. Berrangé > > Signed-off-by: Philippe Mathieu-Daudé > > --- > > hw/i386/x86.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c > > index b84840a1bb9..50bf249c700 100644 > > --- a/hw/i386/x86.c > > +++ b/hw/i386/x86.c > > @@ -108,6 +108,7 @@ void x86_cpu_new(X86MachineState *x86ms, int64_t > > apic_id, Error **errp) > > { > > Object *cpu = object_new(MACHINE(x86ms)->cpu_type); > > > > +object_property_add_child(OBJECT(x86ms), "cpu[*]", OBJECT(cpu)); > > that will take in account only initial cpus, -device/device_add cpus > will still go to wherever device_add attaches them (see qdev_set_id) > > > if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) { > > goto out; > > } >
Re: [PATCH v4 2/4] hw/i386: Attach CPUs to machine
On Sat, 5 Feb 2022 13:45:24 +0100 Philippe Mathieu-Daudé wrote: > Previously CPUs were exposed in the QOM tree at a path > > /machine/unattached/device[nn] > > where the 'nn' of the first CPU is usually zero, but can > vary depending on what devices were already created. > > With this change the CPUs are now at > > /machine/cpu[nn] > > where the 'nn' of the first CPU is always zero. Could you add to commit message the reason behind the change? > Note: This (intentionally) breaks compatibility with current > libvirt code that looks for "/machine/unattached/device[0]" > in the assumption it is the first CPU. Why libvirt does this in the first place? > Cc: libvir-list@redhat.com > Suggested-by: Daniel P. Berrangé > Reviewed-by: Daniel P. Berrangé > Signed-off-by: Philippe Mathieu-Daudé > --- > hw/i386/x86.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c > index b84840a1bb9..50bf249c700 100644 > --- a/hw/i386/x86.c > +++ b/hw/i386/x86.c > @@ -108,6 +108,7 @@ void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, > Error **errp) > { > Object *cpu = object_new(MACHINE(x86ms)->cpu_type); > > +object_property_add_child(OBJECT(x86ms), "cpu[*]", OBJECT(cpu)); that will take in account only initial cpus, -device/device_add cpus will still go to wherever device_add attaches them (see qdev_set_id) > if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) { > goto out; > }
Re: [PATCH 5/5] qemu: Prefer -numa cpu over -numa node,cpus=
On Wed, 20 Oct 2021 16:15:29 +0200 Michal Prívozník wrote: > On 10/20/21 1:18 PM, Peter Krempa wrote: > > On Wed, Oct 20, 2021 at 13:07:59 +0200, Michal Prívozník wrote: > >> On 10/6/21 3:32 PM, Igor Mammedov wrote: > >>> On Thu, 30 Sep 2021 14:08:34 +0200 > >>> Peter Krempa wrote: > > > > [...] > > > >> 2) In my experiments I try to mimic what libvirt does. Here's my cmd > >> line: > >> > >> qemu-system-x86_64 \ > >> -S \ > >> -preconfig \ > >> -cpu host \ > >> -smp 120,sockets=2,dies=3,cores=4,threads=5 \ > >> -object > >> '{"qom-type":"memory-backend-memfd","id":"ram-node0","size":4294967296,"host-nodes":[0],"policy":"bind"}' > >> \ > >> -numa node,nodeid=0,memdev=ram-node0 \ > >> -no-user-config \ > >> -nodefaults \ > >> -no-shutdown \ > >> -qmp stdio > >> > >> and here is my QMP log: > >> > >> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 6}, > >> "package": "v6.1.0-1552-g362534a643"}, "capabilities": ["oob"]}} > >> > >> {"execute":"qmp_capabilities"} > >> {"return": {}} > >> > >> {"execute":"query-hotpluggable-cpus"} > >> {"return": [{"props": {"core-id": 3, "thread-id": 4, "die-id": 2, > >> "socket-id": 1}, "vcpus-count": 1, "type": "host-x86_64-cpu"}, {"props": > >> {"core-id": 3, "thread-id": 3, "die-id": 2, "socket-id": 1}, > >> "vcpus-count": 1, "type": "host-x86_64-cpu"}, {"props": {"core-id": 3, > >> "thread-id": 2, "die-id": 2, "socket-id": 1}, "vcpus-count": 1, "type": > >> "host-x86_64-cpu"}, {"props": {"core-id": 3, "thread-id": 1, "die-id": 2, > >> "socket-id": 1}, "vcpus-count": 1, "type": "host-x86_64-cpu"}, {"props": > >> {"core-id": 3, "thread-id": 0, "die-id": 2, "socket-id": 1}, > >> "vcpus-count": 1, "type": "host-x86_64-cpu"}, {"props": {"core-id": 2, > >> "thread-id": 4, "die-id": 2, "socket-id": 1}, "vcpus-count": 1, "type": > >> "host-x86_64-cpu"}, > >> > >> {"props": {"core-id": 0, "thread-id": 0, "die-id": 0, "socket-id": 0}, > >> "vcpus-count": 1, "type": "host-x86_64-cpu"}]} > >> > >> > >> I can see that query-hotpluggable-cpus returns an array. Can I safely > >> assume that vCPU ID == index in the array? I mean, if I did have -numa > > > > No, this assumption would be incorrect on the aforementioned PPC > > platform where one entry in the returned array can describe multiple > > cores. > > > > qemuDomainFilterHotplugVcpuEntities is the code that cross-references > > the libvirt "index" with the data returned by query-hotpluggable cpus. > > > > The important bit is the 'vcpus-count' property. The code which deals > > with hotplug is already fetching everything that's needed. > > Ah, I see. So my assumption would be correct if vcpus-count would be 1 > for all entries. If it isn't then I need to account for how much only for some boards. An entry in array describes an single entity that should be handled as a single device by user (-device/plug/unplug/other mapping options) (and the entity might have 1 or more vCPUs (threads) depending on target arch/board). > vcpus-count is in each entity. Fair enough. But > qemuDomainFilterHotplugVcpuEntities() doesn't really do vCPU ID -> > [socket, core, thread] translation, does it? > > > But even if it did, I am still wondering what the purpose of this whole > exercise is. QEMU won't be able to drop ID -> [socket, core, thread] > mapping. The only thing it would be able to drop is a few lines of code > handling command line. Am I missing something obvious? I described in other email why QEMU is dropping cpu_idex on external interfaces (it's possible to drop it internally too, but I don't see much gain there vs effort such refactoring would require). Sure thing, you can invent/maintain libvirt internal "vCPU ID" -> [topo props] mapping if it's necessary. However using just a "vCPU ID" will obscure topology information from upper layers. Maybe providing a list of CPUs as an external interface would be better, then user can pick up which CPUs they wish to add/delete/assign/... using items from that list. > Michal >
Re: [PATCH 5/5] qemu: Prefer -numa cpu over -numa node,cpus=
On Wed, 20 Oct 2021 13:07:59 +0200 Michal Prívozník wrote: > On 10/6/21 3:32 PM, Igor Mammedov wrote: > > On Thu, 30 Sep 2021 14:08:34 +0200 > > Peter Krempa wrote: > > > >> On Tue, Sep 21, 2021 at 16:50:31 +0200, Michal Privoznik wrote: > >>> QEMU is trying to obsolete -numa node,cpus= because that uses > >>> ambiguous vCPU id to [socket, die, core, thread] mapping. The new > >>> form is: > >>> > >>> -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T > >>> > >>> which is repeated for every vCPU and places it at [S, D, C, T] > >>> into guest NUMA node N. > >>> > >>> While in general this is magic mapping, we can deal with it. > >>> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology > >>> is given then maxvcpus must be sockets * dies * cores * threads > >>> (i.e. there are no 'holes'). > >>> Secondly, if no topology is given then libvirt itself places each > >>> vCPU into a different socket (basically, it fakes topology of: > >>> [maxvcpus, 1, 1, 1]) > >>> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs > >>> onto topology, to make sure vCPUs don't start to move around. > >> > >> There's a problem with this premise though and unfortunately we don't > >> seem to have qemuxml2argvtest for it. > >> > >> On PPC64, in certain situations the CPU can be configured such that > >> threads are visible only to VMs. This has substantial impact on how CPUs > >> are configured using the modern parameters (until now used only for > >> cpu hotplug purposes, and that's the reason vCPU hotplug has such > >> complicated incantations when starting the VM). > >> > >> In the above situation a CPU with topology of: > >> sockets=1, cores=4, threads=8 (thus 32 cpus) > >> > >> will only expose 4 CPU "devices". > >> > >> core-id: 0, core-id: 8, core-id: 16 and core-id: 24 > >> > >> yet the guest will correctly see 32 cpus when used as such. > >> > >> You can see this in: > >> > >> tests/qemuhotplugtestcpus/ppc64-modern-individual-monitor.json > >> > >> Also note that the 'props' object does _not_ have any socket-id, and > >> management apps are supposed to pass in 'props' as is. (There's a bunch > >> of code to do that on hotplug). > >> > >> The problem is that you need to query the topology first (unless we want > >> to duplicate all of qemu code that has to do with topology state and > >> keep up with changes to it) to know how it's behaving on current > >> machine. This historically was not possible. The supposed solution for > >> this was the pre-config state where we'd be able to query and set it up > >> via QMP, but I was not keeping up sufficiently with that work, so I > >> don't know if it's possible. > >> > >> If preconfig is a viable option we IMO should start using it sooner > >> rather than later and avoid duplicating qemu's logic here. > > > > using preconfig is preferable variant otherwise libvirt > > would end up duplicating topology logic which differs not only > > between targets but also between machine/cpu types. > > > > Closest example how to use preconfig is in pc_dynamic_cpu_cfg() > > test case. Though it uses query-hotpluggable-cpus only for > > verification, but one can use the command at the preconfig > > stage to get topology for given -smp/-machine type combination. > > Alright, -preconfig should be pretty easy. However, I do have some > points to raise/ask: > > 1) currently, exit-preconfig is marked as experimental (hence its "x-" > prefix). Before libvirt consumes it, QEMU should make it stable. Is > there anything that stops QEMU from doing so or is it just a matter of > sending patches (I volunteer to do that)? if I recall correctly, it was made experimental due to lack of actual users (it was supposed that libvirt would consume it once available but it didn't happen for quite a long time). So patches to make it stable interface should be fine. > > 2) In my experiments I try to mimic what libvirt does. Here's my cmd > line: > > qemu-system-x86_64 \ > -S \ > -preconfig \ > -cpu host \ > -smp 120,sockets=2,dies=3,cores=4,threads=5 \ > -object > '{"qom-type":"memory-backend-memfd","id":"ram-node0","
Re: [PATCH 4/5] qemuBuildNumaCommandLine: Separate out building of CPU list
On Thu, 30 Sep 2021 13:33:24 +0200 Peter Krempa wrote: > On Tue, Sep 21, 2021 at 16:50:30 +0200, Michal Privoznik wrote: > > Signed-off-by: Michal Privoznik > > --- > > src/qemu/qemu_command.c | 43 ++--- > > 1 file changed, 27 insertions(+), 16 deletions(-) > > Reviewed-by: Michal Privoznik ^^^ copy past err :)
Re: [PATCH 5/5] qemu: Prefer -numa cpu over -numa node,cpus=
On Thu, 30 Sep 2021 14:08:34 +0200 Peter Krempa wrote: > On Tue, Sep 21, 2021 at 16:50:31 +0200, Michal Privoznik wrote: > > QEMU is trying to obsolete -numa node,cpus= because that uses > > ambiguous vCPU id to [socket, die, core, thread] mapping. The new > > form is: > > > > -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T > > > > which is repeated for every vCPU and places it at [S, D, C, T] > > into guest NUMA node N. > > > > While in general this is magic mapping, we can deal with it. > > Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology > > is given then maxvcpus must be sockets * dies * cores * threads > > (i.e. there are no 'holes'). > > Secondly, if no topology is given then libvirt itself places each > > vCPU into a different socket (basically, it fakes topology of: > > [maxvcpus, 1, 1, 1]) > > Thirdly, we can copy whatever QEMU is doing when mapping vCPUs > > onto topology, to make sure vCPUs don't start to move around. > > There's a problem with this premise though and unfortunately we don't > seem to have qemuxml2argvtest for it. > > On PPC64, in certain situations the CPU can be configured such that > threads are visible only to VMs. This has substantial impact on how CPUs > are configured using the modern parameters (until now used only for > cpu hotplug purposes, and that's the reason vCPU hotplug has such > complicated incantations when starting the VM). > > In the above situation a CPU with topology of: > sockets=1, cores=4, threads=8 (thus 32 cpus) > > will only expose 4 CPU "devices". > > core-id: 0, core-id: 8, core-id: 16 and core-id: 24 > > yet the guest will correctly see 32 cpus when used as such. > > You can see this in: > > tests/qemuhotplugtestcpus/ppc64-modern-individual-monitor.json > > Also note that the 'props' object does _not_ have any socket-id, and > management apps are supposed to pass in 'props' as is. (There's a bunch > of code to do that on hotplug). > > The problem is that you need to query the topology first (unless we want > to duplicate all of qemu code that has to do with topology state and > keep up with changes to it) to know how it's behaving on current > machine. This historically was not possible. The supposed solution for > this was the pre-config state where we'd be able to query and set it up > via QMP, but I was not keeping up sufficiently with that work, so I > don't know if it's possible. > > If preconfig is a viable option we IMO should start using it sooner > rather than later and avoid duplicating qemu's logic here. using preconfig is preferable variant otherwise libvirt would end up duplicating topology logic which differs not only between targets but also between machine/cpu types. Closest example how to use preconfig is in pc_dynamic_cpu_cfg() test case. Though it uses query-hotpluggable-cpus only for verification, but one can use the command at the preconfig stage to get topology for given -smp/-machine type combination. > > > Note, migration from old to new cmd line works and therefore > > doesn't need any special handling. > > > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085 > > Signed-off-by: Michal Privoznik > > --- > > src/qemu/qemu_command.c | 112 +- > > .../hugepages-nvdimm.x86_64-latest.args | 4 +- > > ...memory-default-hugepage.x86_64-latest.args | 10 +- > > .../memfd-memory-numa.x86_64-latest.args | 10 +- > > ...y-hotplug-nvdimm-access.x86_64-latest.args | 4 +- > > ...ory-hotplug-nvdimm-align.x86_64-5.2.0.args | 4 +- > > ...ry-hotplug-nvdimm-align.x86_64-latest.args | 4 +- > > ...ory-hotplug-nvdimm-label.x86_64-5.2.0.args | 4 +- > > ...ry-hotplug-nvdimm-label.x86_64-latest.args | 4 +- > > ...mory-hotplug-nvdimm-pmem.x86_64-5.2.0.args | 4 +- > > ...ory-hotplug-nvdimm-pmem.x86_64-latest.args | 4 +- > > ...-hotplug-nvdimm-readonly.x86_64-5.2.0.args | 4 +- > > ...hotplug-nvdimm-readonly.x86_64-latest.args | 4 +- > > .../memory-hotplug-nvdimm.x86_64-latest.args | 4 +- > > ...mory-hotplug-virtio-pmem.x86_64-5.2.0.args | 4 +- > > ...ory-hotplug-virtio-pmem.x86_64-latest.args | 4 +- > > .../numatune-hmat.x86_64-latest.args | 18 ++- > > ...emnode-restrictive-mode.x86_64-latest.args | 38 +- > > .../numatune-memnode.x86_64-5.2.0.args| 38 +- > > .../numatune-memnode.x86_64-latest.args | 38 +- > > ...vhost-user-fs-fd-memory.x86_64-latest.args | 4 +- > > ...vhost-user-fs-hugepages.x86_64-latest.args | 4 +- > > ...host-user-gpu-secondary.x86_64-latest.args | 3 +- > > .../vhost-user-vga.x86_64-latest.args | 3 +- > > 24 files changed, 296 insertions(+), 34 deletions(-) > > > > diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c > > index f04ae1e311..5192bd7630 100644 > > --- a/src/qemu/qemu_command.c > > +++ b/src/qemu/qemu_command.c > > [...] > > > @@ -7432,6 +7432,94 @@ qemuBuildNumaCPUs(virBuffer *buf, >
Re: [PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and acpi-root-hotplug pm options
On Tue, 28 Sep 2021 11:47:26 +0100 Daniel P. Berrangé wrote: > On Tue, Sep 28, 2021 at 03:28:04PM +0530, Ani Sinha wrote: > > > > > > On Tue, 28 Sep 2021, Daniel P. Berrangé wrote: > > > > > On Tue, Sep 28, 2021 at 02:35:47PM +0530, Ani Sinha wrote: > > > > > > > > > > > > On Tue, 28 Sep 2021, Daniel P. Berrangé wrote: > > > > > > > > > On Sun, Sep 12, 2021 at 08:56:29AM +0530, Ani Sinha wrote: > > > > > > This change introduces libvirt xml support for the following two pm > > > > > > options: [...] > > > > The switch in libvirt for pcie-root-ports > > > > currently does not care whether native or acpi hotplug is used. It > > > > simply > > > > turns on the hotplug for that particular port. Whether ACPI or native is > > > > used is controlled by this global flag that Julia has introduced in > > > > 6.1. > > > Right so we have > > > *1*) following applies to piix4/q35: * ACPI hotplug when enabled, affects _only_ cold-plugged 'bridges' since it requires 'slots' being described in DSDT table which in current impl. is static table built at reset time. (i.e. built-in or 'bridges' specified on command line, where 'bridges' could be PCI-PCI or PCIe-PCI or root/downstream-ports') for anything else ('bridges' added with device_add) native hotplug is in use (whether it's SHPC or PCI-E native). ACPI hotplug wiring is done by calling qbus_set_hotplug_handler() * for root bus piix4_pm_realize()/ich9_pm_init() * for anything else acpi_pcihp_device_plug_cb() > > > * PIIX4 > > > > > > - acpi-root-pci-hotplug=bool > > > > > > Whether hotplug is enabled for the root bridge or not > > > > > >for pci-root controller > > > > > > > > > - acpi-pci-hotplug-with-bridge-support=bool > > > > > > Toggles support for ACPI based hotplug across all bridges. > > > If disabled will there will be no hotplug at all for PIIX4 ? > > > Or does 'shpc' come into play in that scenario ? 'SHPC' hotplug kicks in if it's enabled. (defaults to 'on' except 2.9 machine type) on q35/APCI side of things we always advertise -all_ hotplug methods available build_q35_osc_method() /* * Always allow native PME, AER (no dependencies) * Allow SHPC (PCI bridges can have SHPC controller) */ aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl)); bits 0, 1 are Native PCI-E hotplug and SHPC respectively for PIIX4 we don't have _OSC so it's up to guest OS to make up supported methods. In order of preference: * Windows supports ACPI hotplug then Native PCI-E (SHPC never worked there) * Linux supports ACPI hotplug, SHPC, Native PCI-E (SHPC worked poorly due to need to reserve IO for bridges io reservation hinting (impl. later by Marcel)) > > >PIIX combinations > > > > > >(1) acpi-root-pci-hotplug=yes > > >acpi-pci-hotplug-with-bridge-support=yes > > > > > > - All bridges have hotplug > > > > > >(2) acpi-root-pci-hotplug=yes > > >acpi-pci-hotplug-with-bridge-support=no > > > > > > - No bridges have hotplug > > > > > >(3) acpi-root-pci-hotplug=no > > >acpi-pci-hotplug-with-bridge-support=yes > > > > > > - All bridges except root have hotplug requested by Promox guys, to battle against Windows 'feature' that lets any user to unplug sole NIC using an icon on taskbar. (Laine mentioned we have similar per port control for PCI-E ('hotplug' property) that was requested by other users probably for the same reason) so acpi-root-pci-hotplug is similar to pcie-root-port.hotplug with a difference that the former applies to whole root bus on PIIX4, where the later could be controlled per root port. > > >(4) acpi-root-pci-hotplug=no > > >acpi-pci-hotplug-with-bridge-support=no > > > > > > - No bridges have hotplug. Essentially identical to (2) > > > > no (4) is not identical to (2). In (4) no hotplug is enabled. In (2) pci > > root bus still has hotplug enabled. > > So you're saying that acpi-root-pci-hotplug=yes overrides the > global request acpi-pci-hotplug-with-bridge-support=no and > turns ACPI hotplug back on for the pcie-root historically ACPI hotplug on root bus was always supported without any option, i.e. acpi-root-pci-hotplug=yes by default. acpi-pci-hotplug-with-bridge-support does what its name claims - i.e. adds hotplug for bridges (at least on PIIX4). > > > * Q35 clarification [*1*] still applies > > > > > > > > > - acpi-pci-hotplug-with-bridge-support=bool > > > > > > Toggles support for ACPI based hotplug. If disabled native > > > PCIe hotplug is activated instead > > > > > > > > > * pcie-root-port > > > > > > - hotplug=bool > > > > > > Toggle
Re: [PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and acpi-root-hotplug pm options
On Tue, 28 Sep 2021 11:59:42 +0100 Daniel P. Berrangé wrote: > On Tue, Sep 28, 2021 at 11:47:26AM +0100, Daniel P. Berrangé wrote: > > On Tue, Sep 28, 2021 at 03:28:04PM +0530, Ani Sinha wrote: > > > > > > > > > On Tue, 28 Sep 2021, Daniel P. Berrangé wrote: > > > > > > > On Tue, Sep 28, 2021 at 02:35:47PM +0530, Ani Sinha wrote: > > > > > > > > > > > > > > > On Tue, 28 Sep 2021, Daniel P. Berrangé wrote: > > > > > > > > > > > On Sun, Sep 12, 2021 at 08:56:29AM +0530, Ani Sinha wrote: > > > > > > > This change introduces libvirt xml support for the following two > > > > > > > pm options: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +``acpi-hotplug-bridge`` > > > > > > > + :since:`Since 7.8.0` This option enables or disables BIOS > > > > > > > ACPI based hotplug support > > > > > > > + for cold plugged bridges. It is available only for x86 > > > > > > > guests, both for q35 and pc > > > > > > > + machine types. For pc machines, the support is available from > > > > > > > `QEMU 2.12`. For q35 > > > > > > > + machines, the support is available from `QEMU 6.1`. Examples > > > > > > > of cold plugged bridges > > > > > > > + include PCI-PCI bridges for pc machine types (pci-bridge > > > > > > > controller). For q35 machines, > > > > > > > + it includes PCIE root ports (pcie-root-port controller). This > > > > > > > is a global option that > > > > > > > + affects all bridges. No other bridge specific option is > > > > > > > required to be specified. > > > > > > > > > > > > Can you confirm my understanding of the situation.. > > > > > > > > > > > > - i440fx / PCI topology - hotplug always uses ACPI > > > > > > > > > > > > > > > > ACPI is the primary means of enabling hotplug. shpc might also have a > > > > > role > > > > > here but I think it is disabled. Igor (cc'd) might throw some lights > > > > > on > > > > > how shpc comes to play. > > > > > > > > Yes, I think it will be important to understand if 'shpc' becomes > > > > relevant > > > > when ACPI hotplug is disabled for PCI > > > > > > > > > > > > > > > - q35 / PCIe topology - hotplug historically used native PCIe > > > > > > hotplug, > > > > > > but in 6.1 switched to ACPI > > > > > > > > > > > > > > > > Correct. > > > > > > > > > > > Given, the name "acpi-hotplug-bridge", am I right that this option > > > > > > has *no* effect, if the q35 machine is using native PCIe hotplug > > > > > > approach ? > > > > > > > > > > Its complicated. > > > > > With "acpi-hotplug-bridge" ON, native hotplug is disabled in qemu. > > > > > With "acpi-hotplug-bridge" OFF, native hotplug is enabled in qemu. > > > > > > > > Oh, I mis-read and didn't realize this was controlling the QEMU > > > > "acpi-pci-hotplug-with-bridge-support" configuration. > > > > > > > > With this in mind I think the naming is somewhat misleading. Setting it > > > > to off would give users the impression that hotplug is disabled, which > > > > is not the case for Q35 at least. It is just switching to a different > > > > hotplug implementation. > > > > > > > > At least from Q35 pov, I think it would be better to call it > > > > > > > > hotplug-mode="acpi|pcie" > > > > > > > > so it is clear that no matter what value it is set to, hotplug > > > > is still available. > > > > > > > > If we also consider PIIX, then depending on the answer wrt shpc > > > > above, we might want one of > > > > > > > > hotplug-mode="acpi|pcie|none" > > > > hotplug-mode="acpi|pcie|shpc" > > > > > > > > > > If libvirt does not deal with shpc today I think we should not bother with > > > shpc at all. We should simply have a boolean mode appropriately named that > > > choses between acpi hotplug vs native. > > > > I want to understand what's possible at the qemu hardware level, > > so we don't paint ourselves into a corner. > > > > IIUC, with shpc we only have a toggle on "pci-bridge" devices, > > and those currently have shpc=true by default. There's no shpc > > setting on the pci-root, and theres no global setting. > > Opps, I was mislead. They have shpc=false by default due to machine > types >= 2.9 overriding it to false If I read it correctly, shcp is on by default (modulo 2.9 see 2fa356629ed2) > > > Seems to imply that if we have acpi-hotplug disabled for PIIX, > > then there would be no hotplug on the pci-root, but shpc hotplug > > would still be available on any pci-bridge devices ? > > Regards, > Daniel
Re: [PATCH] Deprecate pmem=on with non-DAX capable backend file
On Wed, 28 Apr 2021 12:29:30 -0400 Eduardo Habkost wrote: > On Tue, Apr 27, 2021 at 04:48:48PM -0400, Eduardo Habkost wrote: > > On Mon, Jan 11, 2021 at 03:33:32PM -0500, Igor Mammedov wrote: > > > It is not safe to pretend that emulated NVDIMM supports > > > persistence while backend actually failed to enable it > > > and used non-persistent mapping as fall back. > > > Instead of falling-back, QEMU should be more strict and > > > error out with clear message that it's not supported. > > > So if user asks for persistence (pmem=on), they should > > > store backing file on NVDIMM. > > > > > > Signed-off-by: Igor Mammedov > > > Reviewed-by: Philippe Mathieu-Daudé > > > > I'm queueing this for 6.1, after changing "since 6.0" to "since 6.1". > > > > Sorry for letting it fall through the cracks. > > This caused build failures[1] and I had to apply the following > fixup. Thanks! > > [1] https://gitlab.com/ehabkost/qemu/-/jobs/1216917482#L3444 > > Signed-off-by: Eduardo Habkost > --- > docs/system/deprecated.rst | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst > index cc8d810be1a..c55c4bceb00 100644 > --- a/docs/system/deprecated.rst > +++ b/docs/system/deprecated.rst > @@ -257,6 +257,7 @@ is (a) not DAX capable or (b) not on a filesystem that > support direct mapping > of persistent memory, is not safe and may lead to data loss or corruption in > case > of host crash. > Options are: > + > - modify VM configuration to set ``pmem=off`` to continue using fake > NVDIMM >(without persistence guaranties) with backing file on non DAX storage > - move backing file to NVDIMM storage and keep ``pmem=on``
Re: [libvirt PATCH 1/6] conf: add support for for PCI devices
On Thu, 8 Apr 2021 09:39:43 +0100 Daniel P. Berrangé wrote: > On Wed, Apr 07, 2021 at 10:23:37PM +0200, Igor Mammedov wrote: > > On Wed, 7 Apr 2021 13:40:03 +0100 > > Daniel P. Berrangé wrote: > > > > > On Wed, Apr 07, 2021 at 09:17:36AM +0200, Peter Krempa wrote: > > > > On Tue, Apr 06, 2021 at 16:31:32 +0100, Daniel Berrange wrote: > > > > > PCI devices can be associated with a unique integer index that is > > > > > exposed via ACPI. In Linux OS with systemd, this value is used for > > > > > provide a NIC device naming scheme that is stable across changes > > > > > in PCI slot configuration. > > > > > > > > > > Signed-off-by: Daniel P. Berrangé > > > > > --- > > > > > docs/formatdomain.rst | 6 +++ > > > > > docs/schemas/domaincommon.rng | 73 > > > > > +++ > > > > > src/conf/device_conf.h| 3 ++ > > > > > src/conf/domain_conf.c| 12 ++ > > > > > 4 files changed, 94 insertions(+) > > > > > > > > > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst > > > > > index 7ba32ea9c1..5db0aac77a 100644 > > > > > --- a/docs/formatdomain.rst > > > > > +++ b/docs/formatdomain.rst > > > > > @@ -4363,6 +4363,7 @@ Network interfaces > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > ... > > > > > @@ -4389,6 +4390,11 @@ when it's in the reserved VMware range by > > > > > adding a ``type="static"`` attribute > > > > > to the element. Note that this attribute is useless if > > > > > the provided > > > > > MAC address is outside of the reserved VMWare ranges. > > > > > > > > > > +:since:`Since 7.3.0`, one can set the ACPI index against network > > > > > interfaces. > > > > > +With some operating systems (eg Linux with systemd), the ACPI index > > > > > is used > > > > > +to provide network interface device naming, that is stable across > > > > > changes > > > > > +in PCI addresses assigned to the device. > > > > > > > > Any range limits or uniqueness requirements worth mentioning? > > > > > > Yes, its required to be unique and below (16 * 1024 - 1) because > > > for some reason QEMU chose to artificially limit its value to > > > match systemd's limit. This is a bit dubious IMHO, as the host > > > should not enforce policy for things that are decided by the > > > guest. > > dropping limit would just postpone error till guest boots > > with effect that 'oboard' naming won't be used and systemd > > will fallback to the next available method. > > That's no big deal - the user will easily see this and change their > config. It is a mere docs problem at most. > > > Given that systemd is the sole known user of this feature, > > it seemed better to me to error out at QEMU start rather than > > waiting till guests boots and let user figure out what's wrong. > > > > If we find another user for the feature that supports full range > > we can drop limit easily without any compat issues. > > There must be other users of this feature, given that we're using > a facility that is part of a formal ACPI specification that existed > before systemd had this feature. Given that I think it is very > bad practice to apply a limit host side that's tied to a single > guest usecase, regardless of whether we happen to know about the > other users. We're basically creating a bug in QEMU upfront that > doesn't need to exist. Ok, I'll post a patch to remove limit once 6.1 dev window is open. > > Regards, > Daniel
Re: [libvirt PATCH 1/6] conf: add support for for PCI devices
On Wed, 7 Apr 2021 13:40:03 +0100 Daniel P. Berrangé wrote: > On Wed, Apr 07, 2021 at 09:17:36AM +0200, Peter Krempa wrote: > > On Tue, Apr 06, 2021 at 16:31:32 +0100, Daniel Berrange wrote: > > > PCI devices can be associated with a unique integer index that is > > > exposed via ACPI. In Linux OS with systemd, this value is used for > > > provide a NIC device naming scheme that is stable across changes > > > in PCI slot configuration. > > > > > > Signed-off-by: Daniel P. Berrangé > > > --- > > > docs/formatdomain.rst | 6 +++ > > > docs/schemas/domaincommon.rng | 73 +++ > > > src/conf/device_conf.h| 3 ++ > > > src/conf/domain_conf.c| 12 ++ > > > 4 files changed, 94 insertions(+) > > > > > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst > > > index 7ba32ea9c1..5db0aac77a 100644 > > > --- a/docs/formatdomain.rst > > > +++ b/docs/formatdomain.rst > > > @@ -4363,6 +4363,7 @@ Network interfaces > > > > > > > > > > > > + > > > > > > > > > ... > > > @@ -4389,6 +4390,11 @@ when it's in the reserved VMware range by adding a > > > ``type="static"`` attribute > > > to the element. Note that this attribute is useless if the > > > provided > > > MAC address is outside of the reserved VMWare ranges. > > > > > > +:since:`Since 7.3.0`, one can set the ACPI index against network > > > interfaces. > > > +With some operating systems (eg Linux with systemd), the ACPI index is > > > used > > > +to provide network interface device naming, that is stable across changes > > > +in PCI addresses assigned to the device. > > > > Any range limits or uniqueness requirements worth mentioning? > > Yes, its required to be unique and below (16 * 1024 - 1) because > for some reason QEMU chose to artificially limit its value to > match systemd's limit. This is a bit dubious IMHO, as the host > should not enforce policy for things that are decided by the > guest. dropping limit would just postpone error till guest boots with effect that 'oboard' naming won't be used and systemd will fallback to the next available method. Given that systemd is the sole known user of this feature, it seemed better to me to error out at QEMU start rather than waiting till guests boots and let user figure out what's wrong. If we find another user for the feature that supports full range we can drop limit easily without any compat issues. > > > Regards, > Daniel
Re: [libvirt PATCH 5/6] qemu: probe for "acpi-index" property
On Tue, 6 Apr 2021 16:31:36 +0100 Daniel P. Berrangé wrote: > This property is exposed by QEMU on any PCI device, but we have to pick > some specific device(s) to probe it against. We expect that at least one > of the virtio devices will be present, so probe against them. Would it be useful to expose capability with MachineInfo in QAPI schema? At least with this on QEMU side I can imagine a crude check and error out in case device has acpi-index set but machine doesn't support it. > > Signed-off-by: Daniel P. Berrangé > --- > src/qemu/qemu_capabilities.c | 8 > src/qemu/qemu_capabilities.h | 3 +++ > tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + > 3 files changed, 12 insertions(+) > > diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c > index ea24e2d6a5..f44a06c5c9 100644 > --- a/src/qemu/qemu_capabilities.c > +++ b/src/qemu/qemu_capabilities.c > @@ -625,6 +625,9 @@ VIR_ENUM_IMPL(virQEMUCaps, >"blockdev-backup", >"object.qapified", >"rotation-rate", > + > + /* 400 */ > + "acpi-index", > ); > > > @@ -1363,6 +1366,7 @@ static struct virQEMUCapsDevicePropsFlags > virQEMUCapsDevicePropsVirtioBalloon[] > { "ats", QEMU_CAPS_VIRTIO_PCI_ATS, NULL }, > { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL }, > { "free-page-reporting", QEMU_CAPS_VIRTIO_BALLOON_FREE_PAGE_REPORTING, > NULL }, > +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL }, > }; > > > @@ -1395,6 +1399,7 @@ static struct virQEMUCapsDevicePropsFlags > virQEMUCapsDevicePropsVirtioBlk[] = { > { "write-cache", QEMU_CAPS_DISK_WRITE_CACHE, NULL }, > { "werror", QEMU_CAPS_STORAGE_WERROR, NULL }, > { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL }, > +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL }, > }; > > static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVirtioNet[] > = { > @@ -1408,6 +1413,7 @@ static struct virQEMUCapsDevicePropsFlags > virQEMUCapsDevicePropsVirtioNet[] = { > { "ats", QEMU_CAPS_VIRTIO_PCI_ATS, NULL }, > { "failover", QEMU_CAPS_VIRTIO_NET_FAILOVER, NULL }, > { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL }, > +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL }, > }; > > static struct virQEMUCapsDevicePropsFlags > virQEMUCapsDevicePropsPCIeRootPort[] = { > @@ -1428,6 +1434,7 @@ static struct virQEMUCapsDevicePropsFlags > virQEMUCapsDevicePropsVirtioSCSI[] = { > { "iommu_platform", QEMU_CAPS_VIRTIO_PCI_IOMMU_PLATFORM, NULL }, > { "ats", QEMU_CAPS_VIRTIO_PCI_ATS, NULL }, > { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL }, > +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL }, > }; > > static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVfioPCI[] = { > @@ -1499,6 +1506,7 @@ static struct virQEMUCapsDevicePropsFlags > virQEMUCapsDevicePropsVirtioGpu[] = { > { "iommu_platform", QEMU_CAPS_VIRTIO_PCI_IOMMU_PLATFORM, NULL }, > { "ats", QEMU_CAPS_VIRTIO_PCI_ATS, NULL }, > { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL }, > +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL }, > }; > > static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsICH9[] = { > diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h > index a70c00a265..22ff3a2f15 100644 > --- a/src/qemu/qemu_capabilities.h > +++ b/src/qemu/qemu_capabilities.h > @@ -606,6 +606,9 @@ typedef enum { /* virQEMUCapsFlags grouping marker for > syntax-check */ > QEMU_CAPS_OBJECT_QAPIFIED, /* parameters for object-add are formally > described */ > QEMU_CAPS_ROTATION_RATE, /* scsi-disk / ide-drive rotation-rate prop */ > > +/* 400 */ > +QEMU_CAPS_ACPI_INDEX, /* PCI device 'acpi-index' property */ > + > QEMU_CAPS_LAST /* this must always be the last item */ > } virQEMUCapsFlags; > > diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml > b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml > index 984a2d5896..592560c3ef 100644 > --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml > +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml > @@ -261,6 +261,7 @@ > > > > + >5002091 >0 >43100242
Re: Ways to deal with broken machine types
On Mon, 29 Mar 2021 15:46:53 +0100 "Dr. David Alan Gilbert" wrote: > * Igor Mammedov (imamm...@redhat.com) wrote: > > On Tue, 23 Mar 2021 17:40:36 + > > Daniel P. Berrangé wrote: > > > > > On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote: > > > > Let me hijack this thread for beyond this case scope. > > > > > > > > I agree that for this particular bug we've done all we could, but > > > > there is broader issue to discuss here. > > > > > > > > We have machine versions to deal with hw compatibility issues and that > > > > covers most of the cases, > > > > but occasionally we notice problem well after release(s), > > > > so users may be stuck with broken VM and need to manually fix > > > > configuration (and/or VM). > > > > Figuring out what's wrong and how to fix it is far from trivial. So > > > > lets discuss if we > > > > can help to ease this pain, yes it will be late for first victims but > > > > it's still > > > > better than never. > > > > > > To summarize the problem situation > > > > > > - We rely on a machine type version to encode a precise guest ABI. > > > - Due a bug, we are in a situation where the same machine type > > >encodes two distinct guest ABIs due to a mistake introduced > > >betwen QEMU N-2 and N-1 > > > - We want to fix the bug in QEMU N > > > - For incoming migration there is no way to distinguish between > > >the ABIs used in N-2 and N-1, to pick the right one > > > > > > So we're left with an unwinnable problem: > > > > > > - Not fixing the bug => > > > > > >a) user migrating N-2 to N-1 have ABI change > > >b) user migrating N-2 to N have ABI change > > >c) user migrating N-1 to N are fine > > > > > > No mitigation for (a) or (b) > > > > > > - Fixing the bug => > > > > > >a) user migrating N-2 to N-1 have ABI change. > > >b) user migrating N-2 to N are fine > > >c) user migrating N-1 to N have ABI change > > > > > > Bad situations (a) and (c) are mitigated by > > > backporting fix to N-1-stable too. > > > > > > Generally we have preferred to fix the bug, because we have > > > usually identified them fairly quickly after release, and > > > backporting the fix to stable has been sufficient mitigation > > > against ill effects. Basically the people left broken are a > > > relatively small set out of the total userbase. > > > > > > The real challenge arises when we are slow to identify the > > > problem, such that we have a large number of people impacted. > > > > > > > > > > I'll try to sum up idea Michael suggested (here comes my unorganized > > > > brain-dump), > > > > > > > > 1. We can keep in VM's config QEMU version it was created on > > > >and as minimum warn user with a pointer to known issues if version in > > > >config mismatches version of actually used QEMU, with a knob to > > > > silence > > > >it for particular mismatch. > > > > > > > > When an issue becomes know and resolved we know for sure how and what > > > > changed and embed instructions on what options to use for fixing up VM's > > > > config to preserve old HW config depending on QEMU version VM was > > > > installed on. > > > > > > > some more ideas: > > > >2. let mgmt layer to keep fixup list and apply them to config if > > > > available > > > >(user would need to upgrade mgmt or update fixup list somehow) > > > >3. let mgmt layer to pass VM's QEMU version to currently used QEMU, > > > > so > > > > that QEMU could maintain and apply fixups based on QEMU version + > > > > machine type. > > > > The user will have to upgrade to newer QEMU to get/use new > > > > fixups. > > > > > > The nice thing about machine type versioning is that we are treating the > > > versions as opaque strings which represent a specific ABI, regardless of > > > the QEMU version. This means that even if distros backport fixes for bugs > > > or even new features, the machine type compatibility check remains a >
Re: Ways to deal with broken machine types
On Tue, 23 Mar 2021 17:40:36 + Daniel P. Berrangé wrote: > On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote: > > Let me hijack this thread for beyond this case scope. > > > > I agree that for this particular bug we've done all we could, but > > there is broader issue to discuss here. > > > > We have machine versions to deal with hw compatibility issues and that > > covers most of the cases, > > but occasionally we notice problem well after release(s), > > so users may be stuck with broken VM and need to manually fix configuration > > (and/or VM). > > Figuring out what's wrong and how to fix it is far from trivial. So lets > > discuss if we > > can help to ease this pain, yes it will be late for first victims but it's > > still > > better than never. > > To summarize the problem situation > > - We rely on a machine type version to encode a precise guest ABI. > - Due a bug, we are in a situation where the same machine type >encodes two distinct guest ABIs due to a mistake introduced >betwen QEMU N-2 and N-1 > - We want to fix the bug in QEMU N > - For incoming migration there is no way to distinguish between >the ABIs used in N-2 and N-1, to pick the right one > > So we're left with an unwinnable problem: > > - Not fixing the bug => > >a) user migrating N-2 to N-1 have ABI change >b) user migrating N-2 to N have ABI change >c) user migrating N-1 to N are fine > > No mitigation for (a) or (b) > > - Fixing the bug => > >a) user migrating N-2 to N-1 have ABI change. >b) user migrating N-2 to N are fine >c) user migrating N-1 to N have ABI change > > Bad situations (a) and (c) are mitigated by > backporting fix to N-1-stable too. > > Generally we have preferred to fix the bug, because we have > usually identified them fairly quickly after release, and > backporting the fix to stable has been sufficient mitigation > against ill effects. Basically the people left broken are a > relatively small set out of the total userbase. > > The real challenge arises when we are slow to identify the > problem, such that we have a large number of people impacted. > > > > I'll try to sum up idea Michael suggested (here comes my unorganized > > brain-dump), > > > > 1. We can keep in VM's config QEMU version it was created on > >and as minimum warn user with a pointer to known issues if version in > >config mismatches version of actually used QEMU, with a knob to silence > >it for particular mismatch. > > > > When an issue becomes know and resolved we know for sure how and what > > changed and embed instructions on what options to use for fixing up VM's > > config to preserve old HW config depending on QEMU version VM was installed > > on. > > > some more ideas: > >2. let mgmt layer to keep fixup list and apply them to config if > > available > >(user would need to upgrade mgmt or update fixup list somehow) > >3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so > > that QEMU could maintain and apply fixups based on QEMU version + > > machine type. > > The user will have to upgrade to newer QEMU to get/use new fixups. > > The nice thing about machine type versioning is that we are treating the > versions as opaque strings which represent a specific ABI, regardless of > the QEMU version. This means that even if distros backport fixes for bugs > or even new features, the machine type compatibility check remains a > simple equality comparsion. > > As soon as you introduce the QEMU version though, we have created a > large matrix for compatibility. This matrix is expanded if a distro > chooses to backport fixes for any of the machine type bugs to their > stable streams. This can get particularly expensive when there are > multiple streams a distro is maintaining. > > *IF* the original N-1 qemu has a property that could be queried by > the mgmt app to identify a machine type bug, then we could potentially > apply a fixup automatically. > > eg query-machines command in QEMU version N could report against > "pc-i440fx-5.0", that there was a regression fix that has to be > applied if property "foo" had value "bar". > > Now, the mgmt app wants to migrate from QEMU N-2 or N-1 to QEMU N. > It can query the value of "foo" on the source QEMU with qom-get. > It now knows whether it has to override this property "foo" when > spawning QEMU N on the tar
Ways to deal with broken machine types
On Tue, 23 Mar 2021 16:04:11 +0100 Thomas Lamprecht wrote: > On 23.03.21 15:55, Vitaly Cheptsov wrote: > >> 23 марта 2021 г., в 17:48, Michael S. Tsirkin написал(а): > >> > >> The issue is with people who installed a VM using 5.1 qemu, > >> migrated to 5.2, booted there and set a config on a device > >> e.g. IP on a NIC. > >> They now have a 5.1 machine type but changing uid back > >> like we do will break these VMs. > >> > >> Unlikley to be common but let's at least create a way for these people > >> to used these VMs. > >> > > They can simply set the 5.2 VM version in such a case. I do not want to > let this legacy hack to be enabled in any modern QEMU VM version, as it > violates ACPI specification and makes the life more difficult for various > other software like bootloaders and operating systems. > > Yeah here I agree with Vitaly, if they already used 5.2 and made some > configurations > for those "new" devices they can just keep using 5.2? > > If some of the devices got configured on 5.1 and some on 5.2 there's nothing > we can > do anyway, from a QEMU POV - there the user always need to choose one machine > version > and fix up the device configured while on the other machine. According to testing it appears that issue affects virtio drivers so it may lead to failure to boot guest (and there was at least 1 report about virtio-scsi being affected). Let me hijack this thread for beyond this case scope. I agree that for this particular bug we've done all we could, but there is broader issue to discuss here. We have machine versions to deal with hw compatibility issues and that covers most of the cases, but occasionally we notice problem well after release(s), so users may be stuck with broken VM and need to manually fix configuration (and/or VM). Figuring out what's wrong and how to fix it is far from trivial. So lets discuss if we can help to ease this pain, yes it will be late for first victims but it's still better than never. I'll try to sum up idea Michael suggested (here comes my unorganized brain-dump), 1. We can keep in VM's config QEMU version it was created on and as minimum warn user with a pointer to known issues if version in config mismatches version of actually used QEMU, with a knob to silence it for particular mismatch. When an issue becomes know and resolved we know for sure how and what changed and embed instructions on what options to use for fixing up VM's config to preserve old HW config depending on QEMU version VM was installed on. some more ideas: 2. let mgmt layer to keep fixup list and apply them to config if available (user would need to upgrade mgmt or update fixup list somehow) 3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so that QEMU could maintain and apply fixups based on QEMU version + machine type. The user will have to upgrade to newer QEMU to get/use new fixups. In my opinion both would lead to explosion of 'possibly needed' properties for each change we introduce in hw/firmware(read ACPI) and very possibly a lot of conditional branches in QEMU code. And I'm afraid it will become hard to maintain QEMU => more bugs in future. Also it will lead to explosion of test matrix for downstreams who care about testing. If we proactively gate changes on properties, we can just update fixup lists in mgmt, without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU side. Alternatively we can be conservative in spawning new properties, that means creating them only when issue is fixed and require users to update QEMU, so that fixups could be applied to VM. Feel free to shoot the messenger down or suggest ways how we can deal with the problem.
Re: [PATCH] Deprecate pmem=on with non-DAX capable backend file
On Mon, 11 Jan 2021 15:33:32 -0500 Igor Mammedov wrote: > It is not safe to pretend that emulated NVDIMM supports > persistence while backend actually failed to enable it > and used non-persistent mapping as fall back. > Instead of falling-back, QEMU should be more strict and > error out with clear message that it's not supported. > So if user asks for persistence (pmem=on), they should > store backing file on NVDIMM. > > Signed-off-by: Igor Mammedov > Reviewed-by: Philippe Mathieu-Daudé > --- > v2: > rephrase deprecation comment andwarning message > (Philippe Mathieu-Daudé ) I've posted as v1 though it's v2 and it looks like it fell through cracks, can someone pick it up if it looks fine, please? > --- > docs/system/deprecated.rst | 17 + > util/mmap-alloc.c | 3 +++ > 2 files changed, 20 insertions(+) > > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst > index bacd76d7a5..e79fb02b3a 100644 > --- a/docs/system/deprecated.rst > +++ b/docs/system/deprecated.rst > @@ -327,6 +327,23 @@ The Raspberry Pi machines come in various models (A, A+, > B, B+). To be able > to distinguish which model QEMU is implementing, the ``raspi2`` and > ``raspi3`` > machines have been renamed ``raspi2b`` and ``raspi3b``. > > +Backend options > +--- > + > +Using non-persistent backing file with pmem=on (since 6.0) > +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''' > + > +This option is used when ``memory-backend-file`` is consumed by emulated > NVDIMM > +device. However enabling ``memory-backend-file.pmem`` option, when backing > file > +is (a) not DAX capable or (b) not on a filesystem that support direct mapping > +of persistent memory, is not safe and may lead to data loss or corruption in > case > +of host crash. > +Options are: > +- modify VM configuration to set ``pmem=off`` to continue using fake > NVDIMM > + (without persistence guaranties) with backing file on non DAX storage > +- move backing file to NVDIMM storage and keep ``pmem=on`` > + (to have NVDIMM with persistence guaranties). > + > Device options > -- > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c > index 27dcccd8ec..0388cc3be2 100644 > --- a/util/mmap-alloc.c > +++ b/util/mmap-alloc.c > @@ -20,6 +20,7 @@ > #include "qemu/osdep.h" > #include "qemu/mmap-alloc.h" > #include "qemu/host-utils.h" > +#include "qemu/error-report.h" > > #define HUGETLBFS_MAGIC 0x958458f6 > > @@ -166,6 +167,8 @@ void *qemu_ram_mmap(int fd, > "crash.\n", file_name); > g_free(proc_link); > g_free(file_name); > +warn_report("Using non DAX backing file with 'pmem=on' option" > +" is deprecated"); > } > /* > * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
Re: [PATCH 1/2] qemu_capabilities: Introduce QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID
On Tue, 12 Jan 2021 20:59:11 +0100 Peter Krempa wrote: > On Tue, Jan 12, 2021 at 20:24:44 +0100, Igor Mammedov wrote: > > On Tue, 12 Jan 2021 18:41:38 + > > Daniel P. Berrangé wrote: > > > > > On Tue, Jan 12, 2021 at 07:28:45PM +0100, Peter Krempa wrote: > > > > On Tue, Jan 12, 2021 at 19:20:58 +0100, Igor Mammedov wrote: > > > > > On Tue, 12 Jan 2021 12:35:19 +0100 > > > > > Peter Krempa wrote > > [...] > > > > Yeah it is pretty dubious on the QEMU side to have used an "x-" prefix > > > here at all, when use of this option is mandatory to make migration > > > work :-( > > > > if generic consensus is to drop prefix, I can post a QEMU patch to do so > > and let downstream(s) to carry burden. > > It really depends on the situation, because the commit messages don't > seem to describe it satisfactory. > > Basically we don't want to ever use a qemu property knob, which is qemu > free to change arbitrarily. > > If the property is to be used with any upcoming qemu version we must get > a guarantee that it will not change. There are two options basically: > > 1) 'x-' is dropped > 1a) we will use it with qemu-6.0 and later > ( everything is clean, but users will have to update qemu to fix it ) I have thought about it some more, (modulo downstream issue) dropping prefix will effectively exclude old QEMU-(5.0-5.2) even though feature is available there. > 1b) we will carry code which will use 'x-' prefixed version from it's > inception until qemu-5.2, when we will hard-mask it out and add > plenty comments outlining that this is not what we do normally > (it will be okay for past releases, since they will not change) 5.2 is not enough, it should be carried till machine type 4.0 exists. On QEMU side we once 4.0 machine type is removed we can deprecate and remove no longer needed option so libvirt (with this patches) would see that it no longer exists and not put it on CLI anymore. Only after that it probably is ok to drop code for it. > > 2) qemu declares the option stable with the 'x-' prefix >We'll require that any place even in the code which declares the >option has an appropriate comment preventing anybody from changing >it. > >We'll then add also cautionary comments discouraging use of it. I've just resent v2 of QEMU patch that incorporates your suggestions. > > 3) qemu fixes the issue without libvirt's involvment if it were possible the option, I'd go for it in the first place. unfortunately, it's too late for it now. > For us really 1a) and 3 is acceptable without any comments. Other > options will require extraordinary measures to prevent using this as > prior art in using any other x-prefixed features from qemu. > > in 1a) case, downstreams can obviously backport the qemu patch renaming > the feature and libvirt will require no change at all > > Now the question is whether we want to make migration work between the > affected releases which will depend what to use. If we can help it, then yes. That's why I resent QEMU patch keeping 'x-' prefix (with your feedback included).
Re: [PATCH 1/2] qemu_capabilities: Introduce QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID
On Tue, 12 Jan 2021 18:41:38 + Daniel P. Berrangé wrote: > On Tue, Jan 12, 2021 at 07:28:45PM +0100, Peter Krempa wrote: > > On Tue, Jan 12, 2021 at 19:20:58 +0100, Igor Mammedov wrote: > > > On Tue, 12 Jan 2021 12:35:19 +0100 > > > Peter Krempa wrote: > > > > > > > On Tue, Jan 12, 2021 at 12:29:58 +0100, Michal Privoznik wrote: > > > > > On 1/12/21 12:19 PM, Peter Krempa wrote: > > > > > > On Tue, Jan 12, 2021 at 09:29:49 +0100, Michal Privoznik wrote: > > > > > > > This capability tracks whether memory-backend-file has > > > > > > > "x-use-canonical-path-for-ramblock-id" attribute. Introduced into > > > > > > > QEMU by commit v4.0.0-rc0~189^2. While "x-" prefix is considered > > > > > > > > > > > > > > > > > > > Please use a commit hash instead of this. > > > > > > > > > > > > > experimental or internal to QEMU, the next commit justifies its > > > > > > > use. > > > > > > > > > > > > NACK unless qemu adds a statement to their code and documentation > > > > > > that > > > > > > the this property is considered stable despite the 'x-prefix' and > > > > > > you > > > > > > add a link to the appropriate qemu upstream commit once it's done. > > > > > > > > > > > > We don't want to depend on experimental stuff so we need a strong > > > > > > excuse. > > > > > > > > > > > > > > > > That's done in the next commit. Do you want me to copy it here too? I > > > > > figured I'd put the justification where I'm actually setting the > > > > > internal > > > > > knob. > > > > > > > > Yes, because this is also mentioning the an 'x-' prefixed property. I > > > > want to be absolutely clear in any places (including a comment in the > > > > code, which you also should add into the capability code) that this is > > > > extraordinary circumstance and that qemu is actually considering that > > > > property stable. > > > > > > the only reason to keep x- prefix in this case is to cause less issues for > > > downstream QEMUs. Since this compat property is copied to their own > > > machine types. > > > If we keep prefix downstream doesn't have to do anything, if we rename it, > > > then downstreams have to carry a separate patch that does the same for > > > their old machine types. > > > > That would be okay if it's limited to past versions, but in this > > instance it is not. Allowing x-prefixed properties for any future > > release is a dangerous precedent. If we want to allow to detect the > > capability also for future release, we must declare that it's for a very > > particular reason and also that qemu will not delete it at will. > > > > This is to prevent any future discussions of unwaranted usage of > > x-prefixed properties in libvirt. > > Yeah it is pretty dubious on the QEMU side to have used an "x-" prefix > here at all, when use of this option is mandatory to make migration > work :-( if generic consensus is to drop prefix, I can post a QEMU patch to do so and let downstream(s) to carry burden. > > Regards, > Daniel
Re: [PATCH 2/2] qemu: Do not Use canonical path for system memory
On Tue, 12 Jan 2021 09:29:50 +0100 Michal Privoznik wrote: > In commit v6.9.0-rc1~450 I've adapted libvirt to QEMU's deprecation of > -mem-path and -mem-prealloc and switched to memory-backend-* even for > system memory. My claim was that that's what QEMU does under the hood > anyway. And indeed it was: see QEMU commit v5.0.0-rc0~75^2~1^2~76 and > look at function create_default_memdev(). > > However, then commit v5.0.0-rc1~11^2~3 was merged into QEMU. While it > was fixing a bug, it also changed the create_default_memdev() function > in which it started turning off use of canonical path (by setting > "x-use-canonical-path-for-ramblock-id" attribute to false). This wasn't > documented until QEMU commit XXX. The path affects migration - the same > path has to be used on the source and on the destination. Therefore, if > there is old guest started with '-m X' it has "pc.ram" block which > doesn't use canonical path and thus when migrating to newer QEMU which > uses memory-backend-* we have to turn off the canonical path explicitly. > Otherwise, "/objects/pc.ram" path would be expected by QEMU which > doesn't match the source. > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1912201 > Signed-off-by: Michal Privoznik > --- > > I'll replace both occurrences of 'QEMU commit XXX' once QEMU patch is > merged. > > src/qemu/qemu_command.c | 30 --- > src/qemu/qemu_command.h | 3 +- > src/qemu/qemu_hotplug.c | 2 +- > .../hugepages-memaccess3.x86_64-latest.args | 4 +-- > 4 files changed, 31 insertions(+), 8 deletions(-) > > diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c > index 6f970a3128..b99d4e5faf 100644 > --- a/src/qemu/qemu_command.c > +++ b/src/qemu/qemu_command.c > @@ -2950,7 +2950,8 @@ qemuBuildMemoryBackendProps(virJSONValuePtr > *backendProps, > qemuDomainObjPrivatePtr priv, > const virDomainDef *def, > const virDomainMemoryDef *mem, > -bool force) > +bool force, > +bool systemMemory) > { > const char *backendType = "memory-backend-file"; > virDomainNumatuneMemMode mode; > @@ -2967,6 +2968,7 @@ qemuBuildMemoryBackendProps(virJSONValuePtr > *backendProps, > bool needHugepage = !!pagesize; > bool useHugepage = !!pagesize; > int discard = mem->discard; > +bool useCanonicalPath = true; > > /* The difference between @needHugepage and @useHugepage is that the > latter > * is true whenever huge page is defined for the current memory cell. > @@ -3081,6 +3083,9 @@ qemuBuildMemoryBackendProps(virJSONValuePtr > *backendProps, > if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) > return -1; > > +if (systemMemory) > +useCanonicalPath = false; > + > } else if (useHugepage || mem->nvdimmPath || memAccess || > def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) { > > @@ -3122,10 +3127,27 @@ qemuBuildMemoryBackendProps(virJSONValuePtr > *backendProps, > > if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0) > return -1; > + > +if (systemMemory) > +useCanonicalPath = false; > + > } else { > backendType = "memory-backend-ram"; > } > > +/* This is a terrible hack, but unfortunately there is no better way. > + * The replacement for '-m X' argument is not simple '-machine > + * memory-backend' and '-object memory-backend-*,size=X' (which was the > + * idea). This is because of create_default_memdev() in QEMU sets > + * 'x-use-canonical-path-for-ramblock-id' attribute to false and is > + * documented in QEMU in qemu-options.hx under 'memory-backend'. > + * See QEMU commit XXX. > + */ > +if (!useCanonicalPath && > +virQEMUCapsGet(priv->qemuCaps, > QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID) && > +virJSONValueObjectAdd(props, > "b:x-use-canonical-path-for-ramblock-id", false, NULL) < 0) > +return -1; is it possible to do it only for old machine types <= 4.0, to limit hack exposure? > if (!priv->memPrealloc && > virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0) > return -1; > @@ -3237,7 +3259,7 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def, > mem.info.alias = alias; > > if ((rc = qemuBuildMemoryBackendProps(&props, alias, cfg, > - priv, def, &mem, false)) < 0) > + priv, def, &mem, false, false)) < > 0) > return -1; > > if (virQEMUBuildObjectCommandlineFromJSON(buf, props) < 0) > @@ -3266,7 +3288,7 @@ qemuBuildMemoryDimmBackendStr(virBufferPtr buf, > alias = g_strdup_printf("mem%s", mem->info.alias); > > if (qemuBuildMemoryBacken
Re: [PATCH 1/2] qemu_capabilities: Introduce QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID
On Tue, 12 Jan 2021 12:35:19 +0100 Peter Krempa wrote: > On Tue, Jan 12, 2021 at 12:29:58 +0100, Michal Privoznik wrote: > > On 1/12/21 12:19 PM, Peter Krempa wrote: > > > On Tue, Jan 12, 2021 at 09:29:49 +0100, Michal Privoznik wrote: > > > > This capability tracks whether memory-backend-file has > > > > "x-use-canonical-path-for-ramblock-id" attribute. Introduced into > > > > QEMU by commit v4.0.0-rc0~189^2. While "x-" prefix is considered > > > > > > Please use a commit hash instead of this. > > > > > > > experimental or internal to QEMU, the next commit justifies its > > > > use. > > > > > > NACK unless qemu adds a statement to their code and documentation that > > > the this property is considered stable despite the 'x-prefix' and you > > > add a link to the appropriate qemu upstream commit once it's done. > > > > > > We don't want to depend on experimental stuff so we need a strong > > > excuse. > > > > > > > That's done in the next commit. Do you want me to copy it here too? I > > figured I'd put the justification where I'm actually setting the internal > > knob. > > Yes, because this is also mentioning the an 'x-' prefixed property. I > want to be absolutely clear in any places (including a comment in the > code, which you also should add into the capability code) that this is > extraordinary circumstance and that qemu is actually considering that > property stable. the only reason to keep x- prefix in this case is to cause less issues for downstream QEMUs. Since this compat property is copied to their own machine types. If we keep prefix downstream doesn't have to do anything, if we rename it, then downstreams have to carry a separate patch that does the same for their old machine types. > I want to prevent that this commit will be used as an excuse to depend > on experimental properties which are not actually considered > non-experimental. >
[PATCH] Deprecate pmem=on with non-DAX capable backend file
It is not safe to pretend that emulated NVDIMM supports persistence while backend actually failed to enable it and used non-persistent mapping as fall back. Instead of falling-back, QEMU should be more strict and error out with clear message that it's not supported. So if user asks for persistence (pmem=on), they should store backing file on NVDIMM. Signed-off-by: Igor Mammedov Reviewed-by: Philippe Mathieu-Daudé --- v2: rephrase deprecation comment andwarning message (Philippe Mathieu-Daudé ) --- docs/system/deprecated.rst | 17 + util/mmap-alloc.c | 3 +++ 2 files changed, 20 insertions(+) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index bacd76d7a5..e79fb02b3a 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -327,6 +327,23 @@ The Raspberry Pi machines come in various models (A, A+, B, B+). To be able to distinguish which model QEMU is implementing, the ``raspi2`` and ``raspi3`` machines have been renamed ``raspi2b`` and ``raspi3b``. +Backend options +--- + +Using non-persistent backing file with pmem=on (since 6.0) +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +This option is used when ``memory-backend-file`` is consumed by emulated NVDIMM +device. However enabling ``memory-backend-file.pmem`` option, when backing file +is (a) not DAX capable or (b) not on a filesystem that support direct mapping +of persistent memory, is not safe and may lead to data loss or corruption in case +of host crash. +Options are: +- modify VM configuration to set ``pmem=off`` to continue using fake NVDIMM + (without persistence guaranties) with backing file on non DAX storage +- move backing file to NVDIMM storage and keep ``pmem=on`` + (to have NVDIMM with persistence guaranties). + Device options -- diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c index 27dcccd8ec..0388cc3be2 100644 --- a/util/mmap-alloc.c +++ b/util/mmap-alloc.c @@ -20,6 +20,7 @@ #include "qemu/osdep.h" #include "qemu/mmap-alloc.h" #include "qemu/host-utils.h" +#include "qemu/error-report.h" #define HUGETLBFS_MAGIC 0x958458f6 @@ -166,6 +167,8 @@ void *qemu_ram_mmap(int fd, "crash.\n", file_name); g_free(proc_link); g_free(file_name); +warn_report("Using non DAX backing file with 'pmem=on' option" +" is deprecated"); } /* * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC, -- 2.27.0
Re: [PATCH] Deprecate pmem=on with non-DAX capable backend file
On Tue, 29 Dec 2020 19:04:58 +0100 Philippe Mathieu-Daudé wrote: > On 12/29/20 6:29 PM, Igor Mammedov wrote: > > It is not safe to pretend that emulated NVDIMM supports > > persistence while backend actually failed to enable it > > and used non-persistent mapping as fall back. > > Instead of falling-back, QEMU should be more strict and > > error out with clear message that it's not supported. > > So if user asks for persistence (pmem=on), they should > > store backing file on NVDIMM. > > > > Signed-off-by: Igor Mammedov > > --- > > docs/system/deprecated.rst | 14 ++ > > util/mmap-alloc.c | 3 +++ > > 2 files changed, 17 insertions(+) > > > > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst > > index bacd76d7a5..ba4f6ed2fe 100644 > > --- a/docs/system/deprecated.rst > > +++ b/docs/system/deprecated.rst > > @@ -327,6 +327,20 @@ The Raspberry Pi machines come in various models (A, > > A+, B, B+). To be able > > to distinguish which model QEMU is implementing, the ``raspi2`` and > > ``raspi3`` > > machines have been renamed ``raspi2b`` and ``raspi3b``. > > > > +Backend options > > +--- > > + > > +Using non-persistent backing file with pmem=on (since 6.0) > > +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''' > > + > > +This option is used when ``memory-backend-file`` is consumed by emulated > > NVDIMM > > +device. However enabling ``memory-backend-file.pmem`` option, when backing > > file > > +is not DAX capable or not on a filesystem that support direct mapping of > > persistent > > Maybe clearer enumerating? As: > "is a) not DAX capable or b) not on a filesystem that support direct > mapping of persistent" will change it to your variant in v2 > > > +memory, is not safe and may lead to data loss or corruption in case of > > host crash. > > +Using pmem=on option with such file will return error, instead of a > > warning. > > Not sure the difference between warn/err is important in the doc. not many care about warnings until QEMU starts fine, I've mentioned error here so that whomever reading this would know what to expect > > > +Options are to move backing file to NVDIMM storage or modify VM > > configuration > > +to set ``pmem=off`` to continue using fake NVDIMM without persistence > > guaranties. > > Maybe: > > The possibilities to continue using fake NVDIMM (without persistence > guaranties) are: > - move backing file to NVDIMM storage > - modify VM configuration to set ``pmem=off`` only the later is faking nvdimm, the first is properly emulated one with persistence guaranties. Maybe: Options are: - modify VM configuration to set ``pmem=off`` to continue using fake NVDIMM (without persistence guaranties) on with backing file on non DAX storage - move backing file to NVDIMM storage and keep ``pmem=on``, to have NVDIMM with persistence guaranties. > > + > > Device options > > -- > > > > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c > > index 27dcccd8ec..d226273a98 100644 > > --- a/util/mmap-alloc.c > > +++ b/util/mmap-alloc.c > > @@ -20,6 +20,7 @@ > > #include "qemu/osdep.h" > > #include "qemu/mmap-alloc.h" > > #include "qemu/host-utils.h" > > +#include "qemu/error-report.h" > > > > #define HUGETLBFS_MAGIC 0x958458f6 > > > > @@ -166,6 +167,8 @@ void *qemu_ram_mmap(int fd, > > "crash.\n", file_name); > > g_free(proc_link); > > g_free(file_name); > > +warn_report("Deprecated using non DAX backing file with" > > +" pmem=on option"); > > Maybe "Using non DAX backing file with 'pmem=on' option is deprecated"? ok > > Beside the nitpicking comments, > Reviewed-by: Philippe Mathieu-Daudé > > > } > > /* > > * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC, > > >
[PATCH] Deprecate pmem=on with non-DAX capable backend file
It is not safe to pretend that emulated NVDIMM supports persistence while backend actually failed to enable it and used non-persistent mapping as fall back. Instead of falling-back, QEMU should be more strict and error out with clear message that it's not supported. So if user asks for persistence (pmem=on), they should store backing file on NVDIMM. Signed-off-by: Igor Mammedov --- docs/system/deprecated.rst | 14 ++ util/mmap-alloc.c | 3 +++ 2 files changed, 17 insertions(+) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index bacd76d7a5..ba4f6ed2fe 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -327,6 +327,20 @@ The Raspberry Pi machines come in various models (A, A+, B, B+). To be able to distinguish which model QEMU is implementing, the ``raspi2`` and ``raspi3`` machines have been renamed ``raspi2b`` and ``raspi3b``. +Backend options +--- + +Using non-persistent backing file with pmem=on (since 6.0) +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +This option is used when ``memory-backend-file`` is consumed by emulated NVDIMM +device. However enabling ``memory-backend-file.pmem`` option, when backing file +is not DAX capable or not on a filesystem that support direct mapping of persistent +memory, is not safe and may lead to data loss or corruption in case of host crash. +Using pmem=on option with such file will return error, instead of a warning. +Options are to move backing file to NVDIMM storage or modify VM configuration +to set ``pmem=off`` to continue using fake NVDIMM without persistence guaranties. + Device options -- diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c index 27dcccd8ec..d226273a98 100644 --- a/util/mmap-alloc.c +++ b/util/mmap-alloc.c @@ -20,6 +20,7 @@ #include "qemu/osdep.h" #include "qemu/mmap-alloc.h" #include "qemu/host-utils.h" +#include "qemu/error-report.h" #define HUGETLBFS_MAGIC 0x958458f6 @@ -166,6 +167,8 @@ void *qemu_ram_mmap(int fd, "crash.\n", file_name); g_free(proc_link); g_free(file_name); +warn_report("Deprecated using non DAX backing file with" +" pmem=on option"); } /* * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC, -- 2.27.0
[RFC 3/5] pci: introduce apci-index property for PCI device
In x86/ACPI world, since systemd v197, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With this one has to plug NIC in exacly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preffered over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't reqiure any advance configuration on guest side. Hope is that 'acpi-index' will be easier to consume by mangment layer, compared to forcing specic PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest network configuration. this patch adds, 'acpi-index'* property and wires up (abuses) unused pci hotplug registers to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov --- CC: libvir-list@redhat.com include/hw/acpi/pcihp.h | 7 ++- include/hw/pci/pci.h| 1 + hw/acpi/pci.c | 6 ++ hw/acpi/pcihp.c | 25 - hw/i386/acpi-build.c| 10 ++ hw/pci/pci.c| 1 + 6 files changed, 48 insertions(+), 2 deletions(-) diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h index dfd375820f..72d1773ca1 100644 --- a/include/hw/acpi/pcihp.h +++ b/include/hw/acpi/pcihp.h @@ -46,6 +46,7 @@ typedef struct AcpiPciHpPciStatus { typedef struct AcpiPciHpState { AcpiPciHpPciStatus acpi_pcihp_pci_status[ACPI_PCIHP_MAX_HOTPLUG_BUS]; uint32_t hotplug_select; +uint32_t acpi_index; PCIBus *root; MemoryRegion io; bool legacy_piix; @@ -71,6 +72,8 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool acpihp_root_off); extern const VMStateDescription vmstate_acpi_pcihp_pci_status; +bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id); + #define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp) \ VMSTATE_UINT32_TEST(pcihp.hotplug_select, state, \ test_pcihp), \ @@ -78,6 +81,8 @@ extern const VMStateDescription vmstate_acpi_pcihp_pci_status; ACPI_PCIHP_MAX_HOTPLUG_BUS, \ test_pcihp, 1, \ vmstate_acpi_pcihp_pci_status, \ - AcpiPciHpPciStatus) + AcpiPciHpPciStatus), \ +VMSTATE_UINT32_TEST(pcihp.acpi_index, state, \ +vmstate_acpi_pcihp_use_acpi_index) #endif diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 259f9c992d..e592532558 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -357,6 +357,7 @@ struct PCIDevice { /* ID of standby device in net_failover pair */ char *failover_pair_id; +uint32_t acpi_index; }; void pci_register_bar(PCIDevice *pci_dev, int region_num, diff --git a/hw/acpi/pci.c b/hw/acpi/pci.c index 9510597a19..07d5101d83 100644 --- a/hw/acpi/pci.c +++ b/hw/acpi/pci.c @@ -27,6 +27,7 @@ #include "hw/acpi/aml-build.h" #include "hw/acpi/pci.h" #include "hw/pci/pcie_host.h" +#include "hw/acpi/pcihp.h" void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info) { @@ -59,3 +60,8 @@ void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info) "MCFG", table_data->len - mcfg_start, 1, NULL, NULL); } +bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id) +{ + AcpiPciHpState *s = opaque; + return s->acpi_index; +} diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c index 9dc4d3e2db..9634567e3a 100644 --- a/hw/acpi/pcihp.c +++ b/hw/acpi/pcihp.c @@ -347,7 +347,8 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) trace_acpi_pci_down_read(val); break; case PCI_EJ_BASE: -/* No feature defined yet */ +val = s->acpi_index; +s->acpi_index = 0; trace_acpi_pci_features_read(val);
Re: [PATCH] qemu: Relax memory pre-allocation rules
On Mon, 30 Nov 2020 11:14:20 + Daniel P. Berrangé wrote: > On Mon, Nov 30, 2020 at 11:48:28AM +0100, Michal Privoznik wrote: > > On 11/30/20 11:16 AM, Daniel P. Berrangé wrote: > > > On Mon, Nov 30, 2020 at 11:06:14AM +0100, Michal Privoznik wrote: > > > > Currently, we configure QEMU to prealloc memory almost by > > > > default. Well, by default for NVDIMMs, hugepages and if user > > > > asked us to (via memoryBacking ). > > > > > > > > However, there are two cases where this approach is not the best: > > > > > > > > 1) in case when guest's NVDIMM is backed by real life NVDIMM. In > > > > this case users should put into the device > > > > , like this: > > > > > > > > > > > > > > > > /dev/pmem0 > > > > > > > > > > > > > > > > > > > > Instructing QEMU to do prealloc in this case means that each > > > > page of the NVDIMM is "touched" (the first byte is read and > > > > written back - see QEMU commit v2.9.0-rc1~26^2) which cripples > > > > device wear. > > > > > > > > 2) if free-page-reporting is turned on. While the > > > > free-page-reporting feature might not have a catchy or obvious > > > > name, when enabled it instructs KVM and subsequently QEMU to > > > > free pages no longer used by guest resulting in smaller memory > > > > footprint. And preallocating whole memory goes against this. > > > > > > > > The BZ comment 11 mentions another, third case 'virtio-mem' but > > > > that is not implemented in libvirt, yet. > > > > > > > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1894053 > > > > Signed-off-by: Michal Privoznik > > > > --- > > > > src/qemu/qemu_command.c | 11 +-- > > > > .../memory-hotplug-nvdimm-pmem.x86_64-latest.args | 2 +- > > > > 2 files changed, 10 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c > > > > index 479bcc0b0c..3df8b5ac76 100644 > > > > --- a/src/qemu/qemu_command.c > > > > +++ b/src/qemu/qemu_command.c > > > > @@ -2977,7 +2977,11 @@ qemuBuildMemoryBackendProps(virJSONValuePtr > > > > *backendProps, > > > > if (discard == VIR_TRISTATE_BOOL_ABSENT) > > > > discard = def->mem.discard; > > > > -if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE) > > > > +/* The whole point of free_page_reporting is that as soon as guest > > > > frees > > > > + * any memory it is freed in the host too. Prealloc doesn't make > > > > much sense > > > > + * then. */ > > > > +if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE > > > > && > > > > +def->memballoon->free_page_reporting != VIR_TRISTATE_SWITCH_ON) > > > > prealloc = true; > > > > > > If the user asked for allocation == immediate, we should not be > > > silently ignoring that request. Isn't the scenario described simply > > > a wierd user configuration scenario and if they don't want that, then > > > then they can set instead. > > > > Okay. > > > > > > > > > if (virDomainNumatuneGetMode(def->numa, mem->targetNode, &mode) < > > > > 0 && > > > > @@ -3064,7 +3068,10 @@ qemuBuildMemoryBackendProps(virJSONValuePtr > > > > *backendProps, > > > > if (mem->nvdimmPath) { > > > > memPath = g_strdup(mem->nvdimmPath); > > > > -prealloc = true; > > > > > > > > > > > > > +/* If the NVDIMM is a real device then there's nothing to > > > > prealloc. > > > > + * If anyhing, we would be only wearing off the device. */ > > > > +if (!mem->nvdimmPmem) > > > > +prealloc = true; > > > > > > I wonder if QEMU itself should take this optimization to skip its > > > allocation logic ? by default QEMU does not prealloc, and if users explicitly ask for prealloc, they should get it. So libvirt also shouldn't set prealloc by default when it comes to nvdimm on a file that's allocated on pmem enabled storage. > > Also would make sense. This is that kind of bug which lies in between > > libvirt and qemu. Although, since we are worried in silently ignoring user > > requests, then wouldn't this be exactly what QEMU would be doing? I mean, if > > an user/libvirt put both .prealloc=yes and .pmem=yes onto cmd line then > > these would cancel out, wouldn't they? > > The difference is that an real NVDIMM is inherantly preallocated. QEMU that's assuming that used backend file is on NVDIMM (pmem=on doesn't guarantee it though) > would not be ignoring the prealloc=yes arg - its implementation would > merely be a no-op. As for ignoring user's input, I don't like it (it usually bites down' the road). if we decide that "pmem=on + prealloc=on" is invalid combo, I'd rather error out with "fix your CLI" kind of message or we can warn user that options combination is not optimal. > Regards, > Daniel
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
On Fri, 23 Oct 2020 11:54:40 -0400 "Michael S. Tsirkin" wrote: > On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote: > > Hi David, > > > > On Fri, Oct 23, 2020 at 6:49 AM David Gibson wrote: > > > > On Thu, 22 Oct 2020 11:01:04 -0400 > > "Michael S. Tsirkin" wrote: > > > > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote: > > > [...] > > > > > > Right. After detecting just failing unconditionally it a bit too > > > simplistic IMHO. > > > > There's also another factor here, which I thought I'd mentioned > > already, but looks like I didn't: I think we're still missing some > > details in what's going on. > > > > The premise for this patch is that plugging while the indicator is in > > transition state is allowed to fail in any way on the guest side. I > > don't think that's a reasonable interpretation, because it's unworkable > > for physical hotplug. If the indicator starts blinking while you're > > in > > the middle of shoving a card in, you'd be in trouble. > > > > So, what I'm assuming here is that while "don't plug while blinking" is > > the instruction for the operator to obey as best they can, on the guest > > side the rule has to be "start blinking, wait a while and by the time > > you leave blinking state again, you can be confident any plugs or > > unplugs have completed". Obviously still racy in the strict computer > > science sense, but about the best you can do with slow humans in the > > mix. > > > > So, qemu should of course endeavour to follow that rule as though it > > was a human operator on a physical machine and not plug when the > > indicator is blinking. *But* the qemu plug will in practice be fast > > enough that if we're hitting real problems here, it suggests the guest > > is still doing something wrong. > > > > > > I personally think there is a little bit of over-engineering here. > > Let's start with the spec: > > > >   Power Indicator Blinking > >   A blinking Power Indicator indicates that the slot is powering up > > or > > powering down and that > >   insertion or removal of the adapter is not permitted. > > > > What exactly is an interpretation here? > > As you stated, the races are theoretical, the whole point of the indicator > > is to let the operator know he can't plug the device just yet. > > > > I understand it would be more user friendly if the QEMU would wait > > internally > > for the > > blinking to end, but the whole point of the indicator is to let the > > operator > > (human or machine) > > know they can't plug the device at a specific time. > > Should QEMU take the responsibility of the operator? Is it even correct? > > > > Even if we would want such a feature, how is it related to this patch? > > The patch simply refuses to start a hotplug operation when it knows it will > > not > > succeed. > >  > > Another way that would make sense to me would be is a new QEMU > > interface other > > than > > "add_device", let's say "adding_device_allowed", that would return true if > > the > > hotplug is allowed > > at this point of time. (I am aware of the theoretical races) > > Rather than adding_device_allowed, something like "query slot" > might be helpful for debugging. That would help user figure out > e.g. why isn't device visible without any races. Would be new command useful tough? What we end up is broken guest (if I read commit message right) and a user who has no idea if device_add was successful or not. So what user should do in this case - wait till it explodes? - can user remove it or it would be stuck there forever? - poll slot before hotplug, manually? (if this is the case then failing device_add cleanly doesn't sound bad, it looks similar to another error we have "/* Check if hot-plug is disabled on the slot */" in pcie_cap_slot_pre_plug_cb) CCing libvirt, as it concerns not only QEMU. > > > The above will at least mimic the mechanics of the pyhs world. The > > operator > > looks at the indicator, > > the management software checks if adding the device is allowed. > > Since it is a corner case I would prefer the device_add to fail rather than > > introducing a new interface, > > but that's just me. > > > > Thanks, > > Marcel > > > > I think we want QEMU management interface to be reasonably > abstract and agnostic if possible. Pushing knowledge of hardware > detail to management will just lead to pain IMHO. > We supported device_add which practically never fails for years, For CPUs and RAM, device_add can fail so maybe management is also prepared to handle errors on PCI hotplug path. > at this point it's easier to keep supporting it than > change all users ... > > > > > > -- > > David Gibson > > Principal Software Engineer, Virtualization, Red Hat > > > >
Re: [PATCH REBASE 7/7] qemu: Use memory-backend-* for regular guest memory
On Tue, 15 Sep 2020 10:59:04 +0100 Daniel P. Berrangé wrote: > On Tue, Sep 15, 2020 at 11:53:56AM +0200, Igor Mammedov wrote: > > On Tue, 15 Sep 2020 10:54:46 +0200 > > Michal Privoznik wrote: > > > > > On 9/8/20 3:55 PM, Ján Tomko wrote: > > > > On a Tuesday in 2020, Michal Privoznik wrote: > > > > > > >> diff --git > > > >> a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args > > > >> > > > >> b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args > > > >> index 5d256c42bc..b43e7d9c3c 100644 > > > >> --- > > > >> a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args > > > >> +++ > > > >> b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args > > > >> @@ -12,14 +12,16 @@ QEMU_AUDIO_DRV=none \ > > > >> -S \ > > > >> -object secret,id=masterKey0,format=raw,\ > > > >> file=/tmp/lib/domain--1-instance-0092/master-key.aes \ > > > >> --machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off \ > > > >> +-machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off,\ > > > >> +memory-backend=pc.ram \ > > > >> -cpu qemu64 \ > > > >> -m 14336 \ > > > >> --mem-prealloc \ > > > >> +-object > > > >> memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,\ > > > >> +share=yes,prealloc=yes,size=15032385536 \ > > > >> -overcommit mem-lock=off \ > > > >> -smp 8,sockets=1,dies=1,cores=8,threads=1 \ > > > >> -object > > > >> memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,\ > > > >> -share=yes,size=15032385536,host-nodes=3,policy=preferred \ > > > >> +share=yes,prealloc=yes,size=15032385536,host-nodes=3,policy=preferred > > > >> \ > > > >> -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ > > > >> -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ > > > >> -display none \ > > > > > > > > Should we format all the fields twice in these cases? > > > > > > Ah, good question. Honestly, I don't remember, it was slightly longer > > > ago that I've written these patches. Igor, do you perhaps remember > > > whether libvirt needs to specify both: -machine memory-backend=$id and > > > -object memory-backend-*,id=$id? > > > > the later defines backend and the former uses it, > > short answer is yes. > > > > you do not need > > --mem-prealloc > > if you explicitly set "prealloc=yes" on backend. > > > > I'd prefer if libvirt stopped using old -mem-prealloc and -mem-path > > in favor of explicit properties on backend, so QEMU could deprecate > > it and drop aliasing code which uses global properties hack. > > IIRC, we tried todo that in the past and the change to use a backend > impacted migration ABI compatibility. for new machine types, that shouldn't happen as they use memory-backend internally (assuming CLI isn't messed up). old machine types should cope with switch too, the only thing we were not able to convert "-numa node,mem" => "-numa memdev" due to odd sizes 'mem' allowed, that's why "-numa node,mem" were preserved for old machine types. > > > Regards, > Daniel
[PATCH v3] cphp: remove deprecated cpu-add command(s)
These were deprecated since 4.0, remove both HMP and QMP variants. Users should use device_add command instead. To get list of possible CPUs and options, use 'info hotpluggable-cpus' HMP or query-hotpluggable-cpus QMP command. Signed-off-by: Igor Mammedov Reviewed-by: Thomas Huth Acked-by: Dr. David Alan Gilbert Reviewed-by: Michal Privoznik Acked-by: Cornelia Huck --- v2,3: fix typos in commit message include/hw/boards.h | 1 - include/hw/i386/pc.h| 1 - include/monitor/hmp.h | 1 - docs/system/deprecated.rst | 25 + hmp-commands.hx | 15 -- hw/core/machine-hmp-cmds.c | 12 - hw/core/machine-qmp-cmds.c | 12 - hw/i386/pc.c| 27 -- hw/i386/pc_piix.c | 1 - hw/s390x/s390-virtio-ccw.c | 12 - qapi/machine.json | 24 - tests/qtest/cpu-plug-test.c | 100 tests/qtest/test-hmp.c | 1 - 13 files changed, 21 insertions(+), 211 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index 795910d01b..7abd5d889c 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -169,7 +169,6 @@ struct MachineClass { void (*init)(MachineState *state); void (*reset)(MachineState *state); void (*wakeup)(MachineState *state); -void (*hot_add_cpu)(MachineState *state, const int64_t id, Error **errp); int (*kvm_type)(MachineState *machine, const char *arg); void (*smp_parse)(MachineState *ms, QemuOpts *opts); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 421a77acc2..79b7ab17bc 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -135,7 +135,6 @@ extern int fd_bootchk; void pc_acpi_smi_interrupt(void *opaque, int irq, int level); -void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp); void pc_smp_parse(MachineState *ms, QemuOpts *opts); void pc_guest_info_init(PCMachineState *pcms); diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h index c986cfd28b..642e9e91f9 100644 --- a/include/monitor/hmp.h +++ b/include/monitor/hmp.h @@ -89,7 +89,6 @@ void hmp_chardev_add(Monitor *mon, const QDict *qdict); void hmp_chardev_change(Monitor *mon, const QDict *qdict); void hmp_chardev_remove(Monitor *mon, const QDict *qdict); void hmp_chardev_send_break(Monitor *mon, const QDict *qdict); -void hmp_cpu_add(Monitor *mon, const QDict *qdict); void hmp_object_add(Monitor *mon, const QDict *qdict); void hmp_object_del(Monitor *mon, const QDict *qdict); void hmp_info_memdev(Monitor *mon, const QDict *qdict); diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index a158e765c3..c43c53f432 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -284,13 +284,6 @@ The ``query-cpus`` command is replaced by the ``query-cpus-fast`` command. The ``arch`` output member of the ``query-cpus-fast`` command is replaced by the ``target`` output member. -``cpu-add`` (since 4.0) -''''''''''''''''''''''' - -Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``. See -documentation of ``query-hotpluggable-cpus`` for additional -details. - ``query-events`` (since 4.0) '''''''''''''''''''''''''''' @@ -306,12 +299,6 @@ the 'wait' field, which is only applicable to sockets in server mode Human Monitor Protocol (HMP) commands - -``cpu-add`` (since 4.0) -''''''''''''''''''''''' - -Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``. See -documentation of ``query-hotpluggable-cpus`` for additional details. - ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` (since 4.0.0) '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' @@ -521,6 +508,12 @@ QEMU Machine Protocol (QMP) commands The "autoload" parameter has been ignored since 2.12.0. All bitmaps are automatically loaded from qcow2 images. +``cpu-add`` (removed in 5.2) +''''''''''''''''''''''''''''
Re: [PATCH REBASE 7/7] qemu: Use memory-backend-* for regular guest memory
On Tue, 15 Sep 2020 10:54:46 +0200 Michal Privoznik wrote: > On 9/8/20 3:55 PM, Ján Tomko wrote: > > On a Tuesday in 2020, Michal Privoznik wrote: > > >> diff --git > >> a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args > >> b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args > >> index 5d256c42bc..b43e7d9c3c 100644 > >> --- > >> a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args > >> +++ > >> b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args > >> @@ -12,14 +12,16 @@ QEMU_AUDIO_DRV=none \ > >> -S \ > >> -object secret,id=masterKey0,format=raw,\ > >> file=/tmp/lib/domain--1-instance-0092/master-key.aes \ > >> --machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off \ > >> +-machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off,\ > >> +memory-backend=pc.ram \ > >> -cpu qemu64 \ > >> -m 14336 \ > >> --mem-prealloc \ > >> +-object memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,\ > >> +share=yes,prealloc=yes,size=15032385536 \ > >> -overcommit mem-lock=off \ > >> -smp 8,sockets=1,dies=1,cores=8,threads=1 \ > >> -object > >> memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,\ > >> -share=yes,size=15032385536,host-nodes=3,policy=preferred \ > >> +share=yes,prealloc=yes,size=15032385536,host-nodes=3,policy=preferred \ > >> -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ > >> -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \ > >> -display none \ > > > > Should we format all the fields twice in these cases? > > Ah, good question. Honestly, I don't remember, it was slightly longer > ago that I've written these patches. Igor, do you perhaps remember > whether libvirt needs to specify both: -machine memory-backend=$id and > -object memory-backend-*,id=$id? the later defines backend and the former uses it, short answer is yes. you do not need --mem-prealloc if you explicitly set "prealloc=yes" on backend. I'd prefer if libvirt stopped using old -mem-prealloc and -mem-path in favor of explicit properties on backend, so QEMU could deprecate it and drop aliasing code which uses global properties hack. also if ' -machine memory-backend=' is used and '-m' sets only initial ram size, then '-m' can be omitted as size will be derived from used backend. > > Michal
Re: [PATCH v2] cphp: remove deprecated cpu-add command(s)
On Mon, 14 Sep 2020 10:07:36 +0200 Michal Privoznik wrote: > On 9/14/20 9:46 AM, Igor Mammedov wrote: > > theses were deprecated since 4.0, remove both HMP and QMP variants. > > > > Users should use device_add command instead. To get list of > > possible CPUs and options, use 'info hotpluggable-cpus' HMP > > or query-hotpluggable-cpus QMP command. > > > > Signed-off-by: Igor Mammedov > > Reviewed-by: Thomas Huth > > Acked-by: Dr. David Alan Gilbert > > --- > > include/hw/boards.h | 1 - > > include/hw/i386/pc.h| 1 - > > include/monitor/hmp.h | 1 - > > docs/system/deprecated.rst | 25 + > > hmp-commands.hx | 15 -- > > hw/core/machine-hmp-cmds.c | 12 - > > hw/core/machine-qmp-cmds.c | 12 - > > hw/i386/pc.c| 27 -- > > hw/i386/pc_piix.c | 1 - > > hw/s390x/s390-virtio-ccw.c | 12 - > > qapi/machine.json | 24 - > > tests/qtest/cpu-plug-test.c | 100 > > tests/qtest/test-hmp.c | 1 - > > 13 files changed, 21 insertions(+), 211 deletions(-) > > Thanks to Peter Libvirt uses device_add instead cpu_add whenever > possible. Hence this is okay from Libvirt's POV. we shoul make libvirt switch from -numa node,cpus= to -numa cpu= to get rid of the 'last' interface that uses cpu-index as input. To help libvirt to migrate existing configs from older syntax to the newer one, we can introduce field x-cpu-index to query-hotplugable-cpus output (with a goal to deprecate it in few years). Would it work for you? > > Reviewed-by: Michal Privoznik Thanks! > > Michal >
[PATCH v2] cphp: remove deprecated cpu-add command(s)
theses were deprecated since 4.0, remove both HMP and QMP variants. Users should use device_add command instead. To get list of possible CPUs and options, use 'info hotpluggable-cpus' HMP or query-hotpluggable-cpus QMP command. Signed-off-by: Igor Mammedov Reviewed-by: Thomas Huth Acked-by: Dr. David Alan Gilbert --- include/hw/boards.h | 1 - include/hw/i386/pc.h| 1 - include/monitor/hmp.h | 1 - docs/system/deprecated.rst | 25 + hmp-commands.hx | 15 -- hw/core/machine-hmp-cmds.c | 12 - hw/core/machine-qmp-cmds.c | 12 - hw/i386/pc.c| 27 -- hw/i386/pc_piix.c | 1 - hw/s390x/s390-virtio-ccw.c | 12 - qapi/machine.json | 24 - tests/qtest/cpu-plug-test.c | 100 tests/qtest/test-hmp.c | 1 - 13 files changed, 21 insertions(+), 211 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index 795910d01b..7abd5d889c 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -169,7 +169,6 @@ struct MachineClass { void (*init)(MachineState *state); void (*reset)(MachineState *state); void (*wakeup)(MachineState *state); -void (*hot_add_cpu)(MachineState *state, const int64_t id, Error **errp); int (*kvm_type)(MachineState *machine, const char *arg); void (*smp_parse)(MachineState *ms, QemuOpts *opts); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 421a77acc2..79b7ab17bc 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -135,7 +135,6 @@ extern int fd_bootchk; void pc_acpi_smi_interrupt(void *opaque, int irq, int level); -void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp); void pc_smp_parse(MachineState *ms, QemuOpts *opts); void pc_guest_info_init(PCMachineState *pcms); diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h index c986cfd28b..642e9e91f9 100644 --- a/include/monitor/hmp.h +++ b/include/monitor/hmp.h @@ -89,7 +89,6 @@ void hmp_chardev_add(Monitor *mon, const QDict *qdict); void hmp_chardev_change(Monitor *mon, const QDict *qdict); void hmp_chardev_remove(Monitor *mon, const QDict *qdict); void hmp_chardev_send_break(Monitor *mon, const QDict *qdict); -void hmp_cpu_add(Monitor *mon, const QDict *qdict); void hmp_object_add(Monitor *mon, const QDict *qdict); void hmp_object_del(Monitor *mon, const QDict *qdict); void hmp_info_memdev(Monitor *mon, const QDict *qdict); diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index a158e765c3..c43c53f432 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -284,13 +284,6 @@ The ``query-cpus`` command is replaced by the ``query-cpus-fast`` command. The ``arch`` output member of the ``query-cpus-fast`` command is replaced by the ``target`` output member. -``cpu-add`` (since 4.0) -''''''''''''''''''''''' - -Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``. See -documentation of ``query-hotpluggable-cpus`` for additional -details. - ``query-events`` (since 4.0) '''''''''''''''''''''''''''' @@ -306,12 +299,6 @@ the 'wait' field, which is only applicable to sockets in server mode Human Monitor Protocol (HMP) commands - -``cpu-add`` (since 4.0) -''''''''''''''''''''''' - -Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``. See -documentation of ``query-hotpluggable-cpus`` for additional details. - ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` (since 4.0.0) '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' @@ -521,6 +508,12 @@ QEMU Machine Protocol (QMP) commands The "autoload" parameter has been ignored since 2.12.0. All bitmaps are automatically loaded from qcow2 images. +``cpu-add`` (removed in 5.2) +'''''''''''''''''''''''''''' + +Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``. See +documentation of `
Re: [PATCH] smp: drop support for deprecated (invalid topologies)
On Fri, 11 Sep 2020 11:04:47 -0400 "Michael S. Tsirkin" wrote: > On Fri, Sep 11, 2020 at 09:32:02AM -0400, Igor Mammedov wrote: > > it's was deprecated since 3.1 > > > > Support for invalid topologies is removed, the user must ensure > > that topologies described with -smp include all possible cpus, > > i.e. (sockets * cores * threads) == maxcpus or QEMU will > > exit with error. > > > > Signed-off-by: Igor Mammedov > > Acked-by: > > memory tree I guess? It would be better for Paolo to take it since he has queued numa deprecations, due to context confilict in deprecated.rst. Paolo, can you queue this patch as well? > > > --- > > docs/system/deprecated.rst | 26 +- > > hw/core/machine.c | 16 > > hw/i386/pc.c | 16 > > 3 files changed, 21 insertions(+), 37 deletions(-) > > > > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst > > index 122717cfee..d737728fab 100644 > > --- a/docs/system/deprecated.rst > > +++ b/docs/system/deprecated.rst > > @@ -47,19 +47,6 @@ The 'file' driver for drives is no longer appropriate > > for character or host > > devices and will only accept regular files (S_IFREG). The correct driver > > for these file types is 'host_cdrom' or 'host_device' as appropriate. > > > > -``-smp`` (invalid topologies) (since 3.1) > > -''''''''''''''''''''''''''''''''''''''''' > > - > > -CPU topology properties should describe whole machine topology including > > -possible CPUs. > > - > > -However, historically it was possible to start QEMU with an incorrect > > topology > > -where *n* <= *sockets* * *cores* * *threads* < *maxcpus*, > > -which could lead to an incorrect topology enumeration by the guest. > > -Support for invalid topologies will be removed, the user must ensure > > -topologies described with -smp include all possible cpus, i.e. > > -*sockets* * *cores* * *threads* = *maxcpus*. > > - > > ``-vnc acl`` (since 4.0.0) > > '''''''''''''''''''''''''' > > > > @@ -618,6 +605,19 @@ New machine versions (since 5.1) will not accept the > > option but it will still > > work with old machine types. User can check the QAPI schema to see if the > > legacy > > option is supported by looking at MachineInfo::numa-mem-supported property. > > > > +``-smp`` (invalid topologies) (removed 5.2) > > +''''''''''''''''''''''''''''''''''''''''''' > > + > > +CPU topology properties should describe whole machine topology including > > +possible CPUs. > > + > > +However, historically it was possible to start QEMU with an incorrect > > topology > > +where *n* <= *sockets* * *cores* * *threads* < *maxcpus*, > > +which could lead to an incorrect topology enumeration by the guest. > > +Support for invalid topologies is removed, the user must ensure > > +topologies described with -smp include all possible cpus, i.e. > > +*sockets* * *cores* * *threads* = *maxcpus*. > > + > > Block devices > > - > > > > diff --git a/hw/core/machine.c b/hw/core/machine.c > > index ea26d61237..09aee4ea52 100644 > > --- a/hw/core/machine.c > > +++ b/hw/core/machine.c > > @@ -754,23 +754,15 @@ static void smp_parse(MachineState *ms, QemuOpts > > *opts) > > exit(1); > > } > > > > -if (sockets * cores * threads > ms->smp.max_cpus) { > > -error_report("cpu topology: " > > - "sockets (%u) * cores (%u) * threads (%u) > " > > - "maxcpus (%u)", > > +if (sockets * cores * threads != ms->smp.max_cpus) { > > +error_report("Invalid CPU topology: " > > + "sockets (%u) * cores (%u) * threads (%u) " > > + "!= maxcpus (%u)", > > sockets, cores, threads, > >
[PATCH] smp: drop support for deprecated (invalid topologies)
it's was deprecated since 3.1 Support for invalid topologies is removed, the user must ensure that topologies described with -smp include all possible cpus, i.e. (sockets * cores * threads) == maxcpus or QEMU will exit with error. Signed-off-by: Igor Mammedov --- docs/system/deprecated.rst | 26 +- hw/core/machine.c | 16 hw/i386/pc.c | 16 3 files changed, 21 insertions(+), 37 deletions(-) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 122717cfee..d737728fab 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -47,19 +47,6 @@ The 'file' driver for drives is no longer appropriate for character or host devices and will only accept regular files (S_IFREG). The correct driver for these file types is 'host_cdrom' or 'host_device' as appropriate. -``-smp`` (invalid topologies) (since 3.1) -''''''''''''''''''''''''''''''''''''''''' - -CPU topology properties should describe whole machine topology including -possible CPUs. - -However, historically it was possible to start QEMU with an incorrect topology -where *n* <= *sockets* * *cores* * *threads* < *maxcpus*, -which could lead to an incorrect topology enumeration by the guest. -Support for invalid topologies will be removed, the user must ensure -topologies described with -smp include all possible cpus, i.e. -*sockets* * *cores* * *threads* = *maxcpus*. - ``-vnc acl`` (since 4.0.0) '''''''''''''''''''''''''' @@ -618,6 +605,19 @@ New machine versions (since 5.1) will not accept the option but it will still work with old machine types. User can check the QAPI schema to see if the legacy option is supported by looking at MachineInfo::numa-mem-supported property. +``-smp`` (invalid topologies) (removed 5.2) +''''''''''''''''''''''''''''''''''''''''''' + +CPU topology properties should describe whole machine topology including +possible CPUs. + +However, historically it was possible to start QEMU with an incorrect topology +where *n* <= *sockets* * *cores* * *threads* < *maxcpus*, +which could lead to an incorrect topology enumeration by the guest. +Support for invalid topologies is removed, the user must ensure +topologies described with -smp include all possible cpus, i.e. +*sockets* * *cores* * *threads* = *maxcpus*. + Block devices - diff --git a/hw/core/machine.c b/hw/core/machine.c index ea26d61237..09aee4ea52 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -754,23 +754,15 @@ static void smp_parse(MachineState *ms, QemuOpts *opts) exit(1); } -if (sockets * cores * threads > ms->smp.max_cpus) { -error_report("cpu topology: " - "sockets (%u) * cores (%u) * threads (%u) > " - "maxcpus (%u)", +if (sockets * cores * threads != ms->smp.max_cpus) { +error_report("Invalid CPU topology: " + "sockets (%u) * cores (%u) * threads (%u) " + "!= maxcpus (%u)", sockets, cores, threads, ms->smp.max_cpus); exit(1); } -if (sockets * cores * threads != ms->smp.max_cpus) { -warn_report("Invalid CPU topology deprecated: " -"sockets (%u) * cores (%u) * threads (%u) " -"!= maxcpus (%u)", -sockets, cores, threads, -ms->smp.max_cpus); -} - ms->smp.cpus = cpus; ms->smp.cores = cores; ms->smp.threads = threads; diff --git a/hw/i386/pc.c b/hw/i386/pc.c index d071da787b..fbde6b04e6 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -746,23 +746,15 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts) exit(1); } -if (sockets * dies * cores * threads > ms->smp.max_cpus) { -error_report("cpu topology: " - "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > " - "maxcpus (%u)", +if (sockets * dies * cores * threads != ms->smp.max_cpus) { +error_report("Invalid CPU topology deprecated: " + "sockets (%u) * dies (%u) * cores (%u) * threads (%u) " + "!= maxcpus (%u)", sockets, dies, cores, threads, ms->smp.max_cpus); exit(1); } -if (sockets * dies * cores * threads != ms->smp.max_cpus) { -warn_report("Invalid CPU topology deprecated: " -"sockets (%u) * dies (%u) * cores (%u) * threads (%u) " -"!= maxcpus (%u)", -sockets, dies, cores, threads, -ms->smp.max_cpus); -} - ms->smp.cpus = cpus; ms->smp.cores = cores; ms->smp.threads = threads; -- 2.27.0
[PATCH] cphp: remove deprecated cpu-add command(s)
theses were deprecatedince since 4.0, remove both HMP and QMP variants. Users should use device_add commnad instead. To get list of possible CPUs and options, use 'info hotpluggable-cpus' HMP or query-hotpluggable-cpus QMP command. Signed-off-by: Igor Mammedov --- include/hw/boards.h | 1 - include/hw/i386/pc.h| 1 - include/monitor/hmp.h | 1 - docs/system/deprecated.rst | 25 + hmp-commands.hx | 15 -- hw/core/machine-hmp-cmds.c | 12 - hw/core/machine-qmp-cmds.c | 12 - hw/i386/pc.c| 27 -- hw/i386/pc_piix.c | 1 - hw/s390x/s390-virtio-ccw.c | 12 - qapi/machine.json | 24 - tests/qtest/cpu-plug-test.c | 100 tests/qtest/test-hmp.c | 1 - 13 files changed, 21 insertions(+), 211 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index bc5b82ad20..2163843bdb 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -173,7 +173,6 @@ struct MachineClass { void (*init)(MachineState *state); void (*reset)(MachineState *state); void (*wakeup)(MachineState *state); -void (*hot_add_cpu)(MachineState *state, const int64_t id, Error **errp); int (*kvm_type)(MachineState *machine, const char *arg); void (*smp_parse)(MachineState *ms, QemuOpts *opts); diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index fe52e165b2..ca8ff6cd27 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -137,7 +137,6 @@ extern int fd_bootchk; void pc_acpi_smi_interrupt(void *opaque, int irq, int level); -void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp); void pc_smp_parse(MachineState *ms, QemuOpts *opts); void pc_guest_info_init(PCMachineState *pcms); diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h index c986cfd28b..642e9e91f9 100644 --- a/include/monitor/hmp.h +++ b/include/monitor/hmp.h @@ -89,7 +89,6 @@ void hmp_chardev_add(Monitor *mon, const QDict *qdict); void hmp_chardev_change(Monitor *mon, const QDict *qdict); void hmp_chardev_remove(Monitor *mon, const QDict *qdict); void hmp_chardev_send_break(Monitor *mon, const QDict *qdict); -void hmp_cpu_add(Monitor *mon, const QDict *qdict); void hmp_object_add(Monitor *mon, const QDict *qdict); void hmp_object_del(Monitor *mon, const QDict *qdict); void hmp_info_memdev(Monitor *mon, const QDict *qdict); diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 851dbdeb8a..122717cfee 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -284,13 +284,6 @@ The ``query-cpus`` command is replaced by the ``query-cpus-fast`` command. The ``arch`` output member of the ``query-cpus-fast`` command is replaced by the ``target`` output member. -``cpu-add`` (since 4.0) -''''''''''''''''''''''' - -Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``. See -documentation of ``query-hotpluggable-cpus`` for additional -details. - ``query-events`` (since 4.0) '''''''''''''''''''''''''''' @@ -306,12 +299,6 @@ the 'wait' field, which is only applicable to sockets in server mode Human Monitor Protocol (HMP) commands - -``cpu-add`` (since 4.0) -''''''''''''''''''''''' - -Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``. See -documentation of ``query-hotpluggable-cpus`` for additional details. - ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` (since 4.0.0) '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' @@ -514,6 +501,12 @@ QEMU Machine Protocol (QMP) commands The "autoload" parameter has been ignored since 2.12.0. All bitmaps are automatically loaded from qcow2 images. +``cpu-add`` (removed in 5.2) +'''''''''''''''''''''''''''' + +Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``. See +documentation of ``query-hotpluggable-cpus`` for additional details. +
[PATCH 2/3] doc: Cleanup "'-mem-path' fallback to RAM" deprecation text
it was actually removed in 5.0, commit 68a86dc15c (numa: remove deprecated -mem-path fallback to anonymous RAM) clean up forgotten remnants in docs. Signed-off-by: Igor Mammedov --- docs/system/deprecated.rst | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 6f9441005a..f252c92901 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -104,17 +104,6 @@ error in the future. The ``-realtime mlock=on|off`` argument has been replaced by the ``-overcommit mem-lock=on|off`` argument. -``-mem-path`` fallback to RAM (since 4.1) -''''''''''''''''''''''''''''''''''''''''' - -Currently if guest RAM allocation from file pointed by ``mem-path`` -fails, QEMU falls back to allocating from RAM, which might result -in unpredictable behavior since the backing file specified by the user -is ignored. In the future, users will be responsible for making sure -the backing storage specified with ``-mem-path`` can actually provide -the guest RAM configured with ``-m`` and QEMU will fail to start up if -RAM allocation is unsuccessful. - RISC-V ``-bios`` (since 5.1) '''''''''''''''''''''''''''' @@ -624,6 +613,16 @@ New machine versions (since 5.1) will not accept the option but it will still work with old machine types. User can check the QAPI schema to see if the legacy option is supported by looking at MachineInfo::numa-mem-supported property. +``-mem-path`` fallback to RAM (remove 5.0) +'''''''''''''''''''''''''''''''''''''''''' + +If guest RAM allocation from file pointed by ``mem-path`` failed, +QEMU was falling back to allocating from RAM, which might have resulted +in unpredictable behavior since the backing file specified by the user +as ignored. Currently, users are responsible for making sure the backing storage +specified with ``-mem-path`` can actually provide the guest RAM configured with +``-m`` and QEMU fails to start up if RAM allocation is unsuccessful. + Block devices - -- 2.27.0
[PATCH 3/3] numa: remove fixup numa_state->num_nodes to MAX_NODES
current code permits only nodeids in [0..MAX_NODES) range due to nodeid check in parse_numa_node() if (nodenr >= MAX_NODES) { error_setg(errp, "Max number of NUMA nodes reached: %" so subj fixup is not reachable, drop it. Signed-off-by: Igor Mammedov --- hw/core/numa.c | 4 1 file changed, 4 deletions(-) diff --git a/hw/core/numa.c b/hw/core/numa.c index 706c1e84c6..7d5d413001 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -677,10 +677,6 @@ void numa_complete_configuration(MachineState *ms) if (ms->numa_state->num_nodes > 0) { uint64_t numa_total; -if (ms->numa_state->num_nodes > MAX_NODES) { -ms->numa_state->num_nodes = MAX_NODES; -} - numa_total = 0; for (i = 0; i < ms->numa_state->num_nodes; i++) { numa_total += numa_info[i].node_mem; -- 2.27.0
[PATCH 1/3] numa: drop support for '-numa node' (without memory specified)
it was deprecated since 4.1 commit 4bb4a2732e (numa: deprecate implict memory distribution between nodes) Users of existing VMs, wishing to preserve the same RAM distribution, should configure it explicitly using ``-numa node,memdev`` options. Current RAM distribution can be retrieved using HMP command `info numa` and if separate memory devices (pc|nv-dimm) are present use `info memory-device` and subtract device memory from output of `info numa`. Signed-off-by: Igor Mammedov --- include/hw/boards.h| 2 -- include/sysemu/numa.h | 4 --- docs/system/deprecated.rst | 23 +--- hw/core/machine.c | 1 - hw/core/numa.c | 55 -- hw/i386/pc_piix.c | 1 - hw/i386/pc_q35.c | 1 - hw/ppc/spapr.c | 1 - 8 files changed, 14 insertions(+), 74 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index bc5b82ad20..15fc1a2bac 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -208,8 +208,6 @@ struct MachineClass { strList *allowed_dynamic_sysbus_devices; bool auto_enable_numa_with_memhp; bool auto_enable_numa_with_memdev; -void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, - int nb_nodes, ram_addr_t size); bool ignore_boot_device_suffixes; bool smbus_no_migration_support; bool nvdimm_supported; diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index ad58ee88f7..4173ef2afa 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -106,10 +106,6 @@ void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node, void numa_complete_configuration(MachineState *ms); void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms); extern QemuOptsList qemu_numa_opts; -void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, - int nb_nodes, ram_addr_t size); -void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, - int nb_nodes, ram_addr_t size); void numa_cpu_pre_plug(const struct CPUArchId *slot, DeviceState *dev, Error **errp); bool numa_uses_legacy_mem(void); diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 851dbdeb8a..6f9441005a 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -104,15 +104,6 @@ error in the future. The ``-realtime mlock=on|off`` argument has been replaced by the ``-overcommit mem-lock=on|off`` argument. -``-numa`` node (without memory specified) (since 4.1) -''''''''''''''''''''''''''''''''''''''''''''''''''''' - -Splitting RAM by default between NUMA nodes has the same issues as ``mem`` -parameter described above with the difference that the role of the user plays -QEMU using implicit generic or board specific splitting rule. -Use ``memdev`` with *memory-backend-ram* backend or ``mem`` (if -it's supported by used machine type) to define mapping explictly instead. - ``-mem-path`` fallback to RAM (since 4.1) ''''''''''''''''''''''''''''''''''''''''' @@ -602,6 +593,20 @@ error when ``-u`` is not used. Command line options +``-numa`` node (without memory specified) (removed 5.2) +''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +Splitting RAM by default between NUMA nodes had the same issues as ``mem`` +parameter with the difference that the role of the user plays QEMU using +implicit generic or board specific splitting rule. +Use ``memdev`` with *memory-backend-ram* backend or ``mem`` (if +it's supported by used machine type) to define mapping explictly instead. +Users of existing VMs, wishing to preserve the same RAM distribution, should +configure it explicitly using ``-numa node,memdev`` options. Current RAM +distribution can be retrieved using HMP command ``info numa`` and if separate +memory devices (pc|nv-dimm) are present use ``info memory-device`` and subtract +device memory from output of ``info numa``. + ``-numa node,mem=``\ *size* (removed in 5.1) ''''''''''''''''''''''
[PATCH 0/3] numa: cleanups for 5.2
Remove deprecated default RAM splitting beween numa nodes that was deprecated since 4.1, and a couple of minor numa clean ups. Igor Mammedov (3): numa: drop support for '-numa node' (without memory specified) doc: Cleanup "'-mem-path' fallback to RAM" deprecation text numa: remove fixup numa_state->num_nodes to MAX_NODES include/hw/boards.h| 2 -- include/sysemu/numa.h | 4 --- docs/system/deprecated.rst | 44 +++- hw/core/machine.c | 1 - hw/core/numa.c | 59 -- hw/i386/pc_piix.c | 1 - hw/i386/pc_q35.c | 1 - hw/ppc/spapr.c | 1 - 8 files changed, 24 insertions(+), 89 deletions(-) -- 2.27.0
[PATCH v5] numa: forbid '-numa node, mem' for 5.1 and newer machine types
Deprecation period is run out and it's a time to flip the switch introduced by cd5ff8333a. Disable legacy option for new machine types (since 5.1) and amend documentation. '-numa node,memdev' shall be used instead of disabled option with new machine types. Signed-off-by: Igor Mammedov Reviewed-by: Michal Privoznik Reviewed-by: Michael S. Tsirkin Reviewed-by: Greg Kurz --- v1: - rebased on top of current master - move compat mode from 4.2 to 5.0 v2: - move deprecation text to recently removed section v3: - increase title line length for (deprecated.rst) '``-numa node,mem=``\ *size* (removed in 5.1)' v4: - use error_append_hint() for suggesting valid CLI v5: - add "\n" at the end of error_append_hint() - fix gramar/spelling in moved deprecation text CC: peter.mayd...@linaro.org CC: ehabk...@redhat.com CC: marcel.apfelb...@gmail.com CC: m...@redhat.com CC: pbonz...@redhat.com CC: r...@twiddle.net CC: da...@gibson.dropbear.id.au CC: libvir-list@redhat.com CC: qemu-...@nongnu.org CC: qemu-...@nongnu.org CC: ebl...@redhat.com CC: gr...@kaod.org --- docs/system/deprecated.rst | 37 - hw/arm/virt.c | 2 +- hw/core/numa.c | 7 +++ hw/i386/pc.c | 1 - hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + hw/ppc/spapr.c | 2 +- qemu-options.hx| 9 + 8 files changed, 36 insertions(+), 24 deletions(-) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 544ece0a45..72666ac764 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -101,23 +101,6 @@ error in the future. The ``-realtime mlock=on|off`` argument has been replaced by the ``-overcommit mem-lock=on|off`` argument. -``-numa node,mem=``\ *size* (since 4.1) -''''''''''''''''''''''''''''''''''''''' - -The parameter ``mem`` of ``-numa node`` is used to assign a part of -guest RAM to a NUMA node. But when using it, it's impossible to manage specified -RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), -so guest end-ups with the fake NUMA configuration with suboptiomal performance. -However since 2014 there is an alternative way to assign RAM to a NUMA node -using parameter ``memdev``, which does the same as ``mem`` and adds -means to actualy manage node RAM on the host side. Use parameter ``memdev`` -with *memory-backend-ram* backend as an replacement for parameter ``mem`` -to achieve the same fake NUMA effect or a properly configured -*memory-backend-file* backend to actually benefit from NUMA configuration. -In future new machine versions will not accept the option but it will still -work with old machine types. User can check QAPI schema to see if the legacy -option is supported by looking at MachineInfo::numa-mem-supported property. - ``-numa`` node (without memory specified) (since 4.1) ''''''''''''''''''''''''''''''''''''''''''''''''''''' @@ -516,3 +499,23 @@ long starting at 1MiB, the old command:: can be rewritten as:: qemu-nbd -t --image-opts driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2 + +Command line options + + +``-numa node,mem=``\ *size* (removed in 5.1) +'''''''''''''''''''''''''''''''''''''''''''' + +The parameter ``mem`` of ``-numa node`` was used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage a specified +RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), +so the guest ends up with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter ``memdev``, which does the same as ``mem`` and adds +means to actually manage node RAM on the host side. Use parameter ``memdev`` +with *memory-backend-ram* backend as replacement for parameter ``mem`` +to achieve the same fake NUMA effect or a properly configured +*memory-backend-file* backend to actually benefit from NUMA configuration. +New machine versions (since 5.1) will not accept the option but it will still +work with old machine types. User can check the QAPI schema to see if the
Re: [PATCH v4] numa: forbid '-numa node,mem' for 5.1 and newer machine types
On Mon, 8 Jun 2020 08:55:08 -0400 "Michael S. Tsirkin" wrote: > On Mon, Jun 08, 2020 at 08:03:44AM -0400, Igor Mammedov wrote: > > Deprecation period is run out and it's a time to flip the switch > > introduced by cd5ff8333a. Disable legacy option for new machine > > types (since 5.1) and amend documentation. > > > > '-numa node,memdev' shall be used instead of disabled option > > with new machine types. > > > > Signed-off-by: Igor Mammedov > > Reviewed-by: Michal Privoznik > > Reviewed-by: Michael S. Tsirkin > Thanks! > numa things so I'm guessing Eduardo's tree? yep, it's pure NUMA so it should go via Eduardo's tere. > > --- > > v1: > > - rebased on top of current master > > - move compat mode from 4.2 to 5.0 > > v2: > > - move deprection text to recently removed section > > v3: > > - increase title line length for (deprecated.rst) > > '``-numa node,mem=``\ *size* (removed in 5.1)' > > v4: > > - use error_append_hint() for suggesting valid CLI > > > > CC: peter.mayd...@linaro.org > > CC: ehabk...@redhat.com > > CC: marcel.apfelb...@gmail.com > > CC: m...@redhat.com > > CC: pbonz...@redhat.com > > CC: r...@twiddle.net > > CC: da...@gibson.dropbear.id.au > > CC: libvir-list@redhat.com > > CC: qemu-...@nongnu.org > > CC: qemu-...@nongnu.org > > CC: ebl...@redhat.com > > CC: gr...@kaod.org > > --- > > docs/system/deprecated.rst | 37 - > > hw/arm/virt.c | 2 +- > > hw/core/numa.c | 7 +++ > > hw/i386/pc.c | 1 - > > hw/i386/pc_piix.c | 1 + > > hw/i386/pc_q35.c | 1 + > > hw/ppc/spapr.c | 2 +- > > qemu-options.hx| 9 + > > 8 files changed, 36 insertions(+), 24 deletions(-) > > > > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst > > index 544ece0a45..e74a5717c8 100644 > > --- a/docs/system/deprecated.rst > > +++ b/docs/system/deprecated.rst > > @@ -101,23 +101,6 @@ error in the future. > > The ``-realtime mlock=on|off`` argument has been replaced by the > > ``-overcommit mem-lock=on|off`` argument. > > > > -``-numa node,mem=``\ *size* (since 4.1) > > -''''''''''''''''''''''''''''''''''''''' > > - > > -The parameter ``mem`` of ``-numa node`` is used to assign a part of > > -guest RAM to a NUMA node. But when using it, it's impossible to manage > > specified > > -RAM chunk on the host side (like bind it to a host node, setting bind > > policy, ...), > > -so guest end-ups with the fake NUMA configuration with suboptiomal > > performance. > > -However since 2014 there is an alternative way to assign RAM to a NUMA node > > -using parameter ``memdev``, which does the same as ``mem`` and adds > > -means to actualy manage node RAM on the host side. Use parameter ``memdev`` > > -with *memory-backend-ram* backend as an replacement for parameter ``mem`` > > -to achieve the same fake NUMA effect or a properly configured > > -*memory-backend-file* backend to actually benefit from NUMA configuration. > > -In future new machine versions will not accept the option but it will still > > -work with old machine types. User can check QAPI schema to see if the > > legacy > > -option is supported by looking at MachineInfo::numa-mem-supported property. > > - > > ``-numa`` node (without memory specified) (since 4.1) > > ''''''''''''''''''''''''''''''''''''''''''''''''''''' > > > > @@ -516,3 +499,23 @@ long starting at 1MiB, the old command:: > > can be rewritten as:: > > > >qemu-nbd -t --image-opts > > driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2 > > + > > +Command line options > > + > > + > > +``-numa node,mem=``\ *size* (removed in 5.1) > > +'''''''''''''''''''''''''''''&
[PATCH v4] numa: forbid '-numa node, mem' for 5.1 and newer machine types
Deprecation period is run out and it's a time to flip the switch introduced by cd5ff8333a. Disable legacy option for new machine types (since 5.1) and amend documentation. '-numa node,memdev' shall be used instead of disabled option with new machine types. Signed-off-by: Igor Mammedov Reviewed-by: Michal Privoznik --- v1: - rebased on top of current master - move compat mode from 4.2 to 5.0 v2: - move deprection text to recently removed section v3: - increase title line length for (deprecated.rst) '``-numa node,mem=``\ *size* (removed in 5.1)' v4: - use error_append_hint() for suggesting valid CLI CC: peter.mayd...@linaro.org CC: ehabk...@redhat.com CC: marcel.apfelb...@gmail.com CC: m...@redhat.com CC: pbonz...@redhat.com CC: r...@twiddle.net CC: da...@gibson.dropbear.id.au CC: libvir-list@redhat.com CC: qemu-...@nongnu.org CC: qemu-...@nongnu.org CC: ebl...@redhat.com CC: gr...@kaod.org --- docs/system/deprecated.rst | 37 - hw/arm/virt.c | 2 +- hw/core/numa.c | 7 +++ hw/i386/pc.c | 1 - hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + hw/ppc/spapr.c | 2 +- qemu-options.hx| 9 + 8 files changed, 36 insertions(+), 24 deletions(-) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 544ece0a45..e74a5717c8 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -101,23 +101,6 @@ error in the future. The ``-realtime mlock=on|off`` argument has been replaced by the ``-overcommit mem-lock=on|off`` argument. -``-numa node,mem=``\ *size* (since 4.1) -''''''''''''''''''''''''''''''''''''''' - -The parameter ``mem`` of ``-numa node`` is used to assign a part of -guest RAM to a NUMA node. But when using it, it's impossible to manage specified -RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), -so guest end-ups with the fake NUMA configuration with suboptiomal performance. -However since 2014 there is an alternative way to assign RAM to a NUMA node -using parameter ``memdev``, which does the same as ``mem`` and adds -means to actualy manage node RAM on the host side. Use parameter ``memdev`` -with *memory-backend-ram* backend as an replacement for parameter ``mem`` -to achieve the same fake NUMA effect or a properly configured -*memory-backend-file* backend to actually benefit from NUMA configuration. -In future new machine versions will not accept the option but it will still -work with old machine types. User can check QAPI schema to see if the legacy -option is supported by looking at MachineInfo::numa-mem-supported property. - ``-numa`` node (without memory specified) (since 4.1) ''''''''''''''''''''''''''''''''''''''''''''''''''''' @@ -516,3 +499,23 @@ long starting at 1MiB, the old command:: can be rewritten as:: qemu-nbd -t --image-opts driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2 + +Command line options + + +``-numa node,mem=``\ *size* (removed in 5.1) +'''''''''''''''''''''''''''''''''''''''''''' + +The parameter ``mem`` of ``-numa node`` is used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage specified +RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), +so guest end-ups with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter ``memdev``, which does the same as ``mem`` and adds +means to actualy manage node RAM on the host side. Use parameter ``memdev`` +with *memory-backend-ram* backend as an replacement for parameter ``mem`` +to achieve the same fake NUMA effect or a properly configured +*memory-backend-file* backend to actually benefit from NUMA configuration. +In future new machine versions will not accept the option but it will still +work with old machine types. User can check QAPI schema to see if the legacy +option is supported by looking at MachineInfo::numa-mem-supported property. diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 37462a6f78..063d4703f7 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c
Re: [PATCH v3] numa: forbid '-numa node, mem' for 5.1 and newer machine types
On Fri, 5 Jun 2020 18:47:58 +0200 Greg Kurz wrote: > On Fri, 5 Jun 2020 12:03:21 -0400 > Igor Mammedov wrote: > > > Deprecation period is run out and it's a time to flip the switch > > introduced by cd5ff8333a. Disable legacy option for new machine > > types (since 5.1) and amend documentation. > > > > '-numa node,memdev' shall be used instead of disabled option > > with new machine types. > > > > Signed-off-by: Igor Mammedov > > Reviewed-by: Michal Privoznik > > --- > > v1: > > - rebased on top of current master > > - move compat mode from 4.2 to 5.0 > > v2: > > - move deprection text to recently removed section > > v3: > > - increase title line length for (deprecated.rst) > > '``-numa node,mem=``\ *size* (removed in 5.1)' > > > > CC: peter.mayd...@linaro.org > > CC: ehabk...@redhat.com > > CC: marcel.apfelb...@gmail.com > > CC: m...@redhat.com > > CC: pbonz...@redhat.com > > CC: r...@twiddle.net > > CC: da...@gibson.dropbear.id.au > > CC: libvir-list@redhat.com > > CC: qemu-...@nongnu.org > > CC: qemu-...@nongnu.org > > --- > > docs/system/deprecated.rst | 37 - > > hw/arm/virt.c | 2 +- > > hw/core/numa.c | 6 ++ > > hw/i386/pc.c | 1 - > > hw/i386/pc_piix.c | 1 + > > hw/i386/pc_q35.c | 1 + > > hw/ppc/spapr.c | 2 +- > > qemu-options.hx| 9 + > > 8 files changed, 35 insertions(+), 24 deletions(-) > > > > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst > > index f0061f94aa..502e41ff35 100644 > > --- a/docs/system/deprecated.rst > > +++ b/docs/system/deprecated.rst > > @@ -101,23 +101,6 @@ error in the future. > > The ``-realtime mlock=on|off`` argument has been replaced by the > > ``-overcommit mem-lock=on|off`` argument. > > > > -``-numa node,mem=``\ *size* (since 4.1) > > -''''''''''''''''''''''''''''''''''''''' > > - > > -The parameter ``mem`` of ``-numa node`` is used to assign a part of > > -guest RAM to a NUMA node. But when using it, it's impossible to manage > > specified > > -RAM chunk on the host side (like bind it to a host node, setting bind > > policy, ...), > > -so guest end-ups with the fake NUMA configuration with suboptiomal > > performance. > > -However since 2014 there is an alternative way to assign RAM to a NUMA node > > -using parameter ``memdev``, which does the same as ``mem`` and adds > > -means to actualy manage node RAM on the host side. Use parameter ``memdev`` > > -with *memory-backend-ram* backend as an replacement for parameter ``mem`` > > -to achieve the same fake NUMA effect or a properly configured > > -*memory-backend-file* backend to actually benefit from NUMA configuration. > > -In future new machine versions will not accept the option but it will still > > -work with old machine types. User can check QAPI schema to see if the > > legacy > > -option is supported by looking at MachineInfo::numa-mem-supported property. > > - > > ``-numa`` node (without memory specified) (since 4.1) > > ''''''''''''''''''''''''''''''''''''''''''''''''''''' > > > > @@ -512,3 +495,23 @@ long starting at 1MiB, the old command:: > > can be rewritten as:: > > > >qemu-nbd -t --image-opts > > driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2 > > + > > +Command line options > > + > > + > > +``-numa node,mem=``\ *size* (removed in 5.1) > > +'''''''''''''''''''''''''''''''''''''''''''' > > + > > +The parameter ``mem`` of ``-numa node`` is used to assign a part of > > +guest RAM to a NUMA node. But when using it, it's impossible to manage > > specified > > +RAM chunk on the host side (like bind it
Re: [PATCH] numa: forbid '-numa node, mem' for 5.1 and newer machine types
On Thu, 4 Jun 2020 07:22:51 -0500 Eric Blake wrote: > On 6/2/20 3:41 AM, Igor Mammedov wrote: > > Deprecation period is run out and it's a time to flip the switch > > introduced by cd5ff8333a. Disable legacy option for new machine > > types (since 5.1) and amend documentation. > > > > '-numa node,memdev' shall be used instead of disabled option > > with new machine types. > > > > Signed-off-by: Igor Mammedov > > --- > > - rebased on top of current master > > - move compat mode from 4.2 to 5.0 > > > > > docs/system/deprecated.rst | 17 - > > Lately, when we remove something, we've been moving the documentation > from 'will be deprecated' to a later section of the document 'has been > removed', so that the history is not lost. But this diffstat says you > just deleted, rather than moved, that hunk. > I didn't know that, I'll send v2 with this hunk moved to removed section
[PATCH v2] numa: forbid '-numa node, mem' for 5.1 and newer machine types
Deprecation period is run out and it's a time to flip the switch introduced by cd5ff8333a. Disable legacy option for new machine types (since 5.1) and amend documentation. '-numa node,memdev' shall be used instead of disabled option with new machine types. Signed-off-by: Igor Mammedov Reviewed-by: Michal Privoznik --- v1: - rebased on top of current master - move compat mode from 4.2 to 5.0 v2: - move deprection text to recently removed section - pick up reviewed-bys CC: peter.mayd...@linaro.org CC: ehabk...@redhat.com CC: marcel.apfelb...@gmail.com CC: m...@redhat.com CC: pbonz...@redhat.com CC: r...@twiddle.net CC: da...@gibson.dropbear.id.au CC: libvir-list@redhat.com CC: qemu-...@nongnu.org CC: qemu-...@nongnu.org CC: ebl...@redhat.com --- docs/system/deprecated.rst | 37 - hw/arm/virt.c | 2 +- hw/core/numa.c | 6 ++ hw/i386/pc.c | 1 - hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + hw/ppc/spapr.c | 2 +- qemu-options.hx| 9 + 8 files changed, 35 insertions(+), 24 deletions(-) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index f0061f94aa..6f717e4a1d 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -101,23 +101,6 @@ error in the future. The ``-realtime mlock=on|off`` argument has been replaced by the ``-overcommit mem-lock=on|off`` argument. -``-numa node,mem=``\ *size* (since 4.1) -''''''''''''''''''''''''''''''''''''''' - -The parameter ``mem`` of ``-numa node`` is used to assign a part of -guest RAM to a NUMA node. But when using it, it's impossible to manage specified -RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), -so guest end-ups with the fake NUMA configuration with suboptiomal performance. -However since 2014 there is an alternative way to assign RAM to a NUMA node -using parameter ``memdev``, which does the same as ``mem`` and adds -means to actualy manage node RAM on the host side. Use parameter ``memdev`` -with *memory-backend-ram* backend as an replacement for parameter ``mem`` -to achieve the same fake NUMA effect or a properly configured -*memory-backend-file* backend to actually benefit from NUMA configuration. -In future new machine versions will not accept the option but it will still -work with old machine types. User can check QAPI schema to see if the legacy -option is supported by looking at MachineInfo::numa-mem-supported property. - ``-numa`` node (without memory specified) (since 4.1) ''''''''''''''''''''''''''''''''''''''''''''''''''''' @@ -512,3 +495,23 @@ long starting at 1MiB, the old command:: can be rewritten as:: qemu-nbd -t --image-opts driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2 + +Command line options + + +``-numa node,mem=``\ *size* (removed in 5.1) +''''''''''''''''''''''''''''''''''''''' + +The parameter ``mem`` of ``-numa node`` is used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage specified +RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), +so guest end-ups with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter ``memdev``, which does the same as ``mem`` and adds +means to actualy manage node RAM on the host side. Use parameter ``memdev`` +with *memory-backend-ram* backend as an replacement for parameter ``mem`` +to achieve the same fake NUMA effect or a properly configured +*memory-backend-file* backend to actually benefit from NUMA configuration. +In future new machine versions will not accept the option but it will still +work with old machine types. User can check QAPI schema to see if the legacy +option is supported by looking at MachineInfo::numa-mem-supported property. diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 37462a6f78..063d4703f7 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2262,7 +2262,6 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) hc->pre_plug = virt_machine_device_pre_plug_cb; hc->plug = virt_machine_device_plug_cb;
[PATCH v3] numa: forbid '-numa node, mem' for 5.1 and newer machine types
Deprecation period is run out and it's a time to flip the switch introduced by cd5ff8333a. Disable legacy option for new machine types (since 5.1) and amend documentation. '-numa node,memdev' shall be used instead of disabled option with new machine types. Signed-off-by: Igor Mammedov Reviewed-by: Michal Privoznik --- v1: - rebased on top of current master - move compat mode from 4.2 to 5.0 v2: - move deprection text to recently removed section v3: - increase title line length for (deprecated.rst) '``-numa node,mem=``\ *size* (removed in 5.1)' CC: peter.mayd...@linaro.org CC: ehabk...@redhat.com CC: marcel.apfelb...@gmail.com CC: m...@redhat.com CC: pbonz...@redhat.com CC: r...@twiddle.net CC: da...@gibson.dropbear.id.au CC: libvir-list@redhat.com CC: qemu-...@nongnu.org CC: qemu-...@nongnu.org --- docs/system/deprecated.rst | 37 - hw/arm/virt.c | 2 +- hw/core/numa.c | 6 ++ hw/i386/pc.c | 1 - hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + hw/ppc/spapr.c | 2 +- qemu-options.hx| 9 + 8 files changed, 35 insertions(+), 24 deletions(-) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index f0061f94aa..502e41ff35 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -101,23 +101,6 @@ error in the future. The ``-realtime mlock=on|off`` argument has been replaced by the ``-overcommit mem-lock=on|off`` argument. -``-numa node,mem=``\ *size* (since 4.1) -''''''''''''''''''''''''''''''''''''''' - -The parameter ``mem`` of ``-numa node`` is used to assign a part of -guest RAM to a NUMA node. But when using it, it's impossible to manage specified -RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), -so guest end-ups with the fake NUMA configuration with suboptiomal performance. -However since 2014 there is an alternative way to assign RAM to a NUMA node -using parameter ``memdev``, which does the same as ``mem`` and adds -means to actualy manage node RAM on the host side. Use parameter ``memdev`` -with *memory-backend-ram* backend as an replacement for parameter ``mem`` -to achieve the same fake NUMA effect or a properly configured -*memory-backend-file* backend to actually benefit from NUMA configuration. -In future new machine versions will not accept the option but it will still -work with old machine types. User can check QAPI schema to see if the legacy -option is supported by looking at MachineInfo::numa-mem-supported property. - ``-numa`` node (without memory specified) (since 4.1) ''''''''''''''''''''''''''''''''''''''''''''''''''''' @@ -512,3 +495,23 @@ long starting at 1MiB, the old command:: can be rewritten as:: qemu-nbd -t --image-opts driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2 + +Command line options + + +``-numa node,mem=``\ *size* (removed in 5.1) +'''''''''''''''''''''''''''''''''''''''''''' + +The parameter ``mem`` of ``-numa node`` is used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage specified +RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), +so guest end-ups with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter ``memdev``, which does the same as ``mem`` and adds +means to actualy manage node RAM on the host side. Use parameter ``memdev`` +with *memory-backend-ram* backend as an replacement for parameter ``mem`` +to achieve the same fake NUMA effect or a properly configured +*memory-backend-file* backend to actually benefit from NUMA configuration. +In future new machine versions will not accept the option but it will still +work with old machine types. User can check QAPI schema to see if the legacy +option is supported by looking at MachineInfo::numa-mem-supported property. diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 37462a6f78..063d4703f7 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2262,7 +2262,6 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) hc-&g
Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=
On Thu, 4 Jun 2020 10:58:01 +0200 Michal Privoznik wrote: > On 5/27/20 3:58 PM, Igor Mammedov wrote: > > On Tue, 26 May 2020 17:31:09 +0200 > > Michal Privoznik wrote: > > > >> On 5/26/20 4:51 PM, Igor Mammedov wrote: > >>> On Mon, 25 May 2020 10:05:08 +0200 > >>> Michal Privoznik wrote: > >>> > >>>> > >>>> This is a problem. The domain XML that is provided can't be changed, > >>>> mostly because mgmt apps construct it on the fly and then just pass it > >>>> as a RO string to libvirt. While libvirt could create a separate cache, > >>>> there has to be a better way. > >>>> > >>>> I mean, I can add some more code that once the guest is running > >>>> preserves the mapping during migration. But that assumes a running QEMU. > >>>> When starting a domain from scratch, is it acceptable it vCPU topology > >>>> changes? I suspect it is not. > >>> I'm not sure I got you but > >>> vCPU topology isn't changnig but when starting QEMU, user has to map > >>> 'concrete vCPUs' to spencific numa nodes. The issue here is that > >>> to specify concrete vCPUs user needs to get layout from QEMU first > >>> as it's a function of target/machine/-smp and possibly cpu type. > >> > >> Assume the following config: 4 vCPUs (2 sockets, 2 cores, 1 thread > >> topology) and 2 NUMA nodes and the following assignment to NUMA: > >> > >> node 0: cpus=0-1 > >> node 1: cpus=2-3 > >> > >> With old libvirt & qemu (and assuming x86_64 - not EPYC), I assume the > >> following topology is going to be used: > >> > >> node 0: socket=0,core=0,thread=0 (vCPU0) socket=0,core=1,thread=0 (vCPU1) > >> node 1: socket=1,core=0,thread=0 (vCPU2) socket=1,core=1,thread=0 (vCPU3) > >> > >> Now, user upgrades libvirt & qemu but doesn't change the config. And on > >> a fresh new start (no migration), they might get a different topology: > >> > >> node 0: socket=0,core=0,thread=0 (vCPU0) socket=1,core=0,thread=0 (vCPU1) > >> node 1: socket=0,core=1,thread=0 (vCPU2) socket=1,core=1,thread=0 (vCPU3) > >> > > > > that shouldn't happen at least for as long as machine version stays the > > same > > Shouldn't as in it's bad if it happens or as in QEMU won't change > topology for released machine types? it's the second > Well, we are talking about libvirt > generating the topology. > > >> The problem here is not how to assign vCPUs to NUMA nodes, the problem > >> is how to translate vCPU IDs to socket=,core=,thread=. > > if you are talking about libvirt's vCPU IDs, then it's separate issue > > as it's user facing API, I think it should not rely on cpu_index. > > Instead it should map vCPU IDs to ([socket,]core[,thread]) tuple > > or maybe drop notion of vCPU IDs and expose ([socket,]core[,thread]) > > to users if they ask for numa aware config. > > And this is the thing I am asking. How to map vCPU IDs to > socket,core,thread and how to do it reliably. vCPU ID has the same drawbacks as cpu_index in QEMU, it provides zero information about topology. Which is fine in non NUMA case since user doesn't care about topology at all (I'm assuming it's libvirt who does pinning and it would use topology info to pin vcpus correctly). But for NUMA case, as a user I'd like to see/use topology instead of vCPU ID, especially if user is in charge of assigning vCPUs to nodes. I'd drop vCPU IDs concept altogether and use ([socket,]core[,thread]) tuple to describe vCPUs instead. It should work fine for both usecases and you wouldn't have to do mapping to vCPU IDs. (I'm talking here about new configs that use new machine types and ignore compatibility. More on the later see below) > > > > PS: > > I'm curious how libvirt currently implements numa mapping and > > how it's correlated with pinnig to host nodes? > > Does it have any sort of code to calculate topology based on cpu_index > > so it could properly assign vCPUs to nodes or all the pain of > > assigning vCPU IDs to nodes is on the user shoulders? > > It's on users. In the domain XML they specify number of vCPUs, and then > they can assign individual IDs to NUMA nodes. For instance: > >8 > > > > > > > > > translates to: > >-smp 8,sockets=8,cores=1,threads=1 >-numa node,nodeid
Re: [PATCH 5/5] qemu_validate.c: revert NUMA CPU range user warning
On Mon, 1 Jun 2020 17:14:20 -0300 Daniel Henrique Barboza wrote: > On 6/1/20 4:40 PM, Peter Krempa wrote: > > On Mon, Jun 01, 2020 at 14:50:41 -0300, Daniel Henrique Barboza wrote: > >> Now that we have the auto-fill code in place, and with proper documentation > >> to let the user know that (1) we will auto-fill the NUMA cpus up to the > >> number to maximum VCPUs number if QEMU supports it and (2) the user > >> is advised to always supply a complete NUMA topology, this warning > >> is unneeded. > >> > >> This reverts commit 38d2e033686b5cc274f8f55075ce1985b71e329a. > > > > Since we already have the validation in place for some time now I think > > we should just keep it. The auto-filling would be a useful hack to work > > around if config breaks, but judged by itself it's of questionable > > benefit. > > That's a good point. I agree that removing the message after being in place > for this long is more trouble than it's worth. > > > > > Specifically users might end up with a topology which they didn't > > expect. Reasoning is basically the same as with qemu. Any default > > behaviour here is a policy decision and it might not suit all uses. > > > > > An ideal situation would be QEMU to never accept incomplete NUMA topologies > in the first place. At least with your series I can safely drop deprecated incomplete NUMA topologies on QEMU side (which were producing warnings for a while) > > Given that this wasn't the case and now there might be a plethora of guests > running with goofy topologies all around, the already existing warning > message + this auto-fill hack + documentation mentioning that users should > avoid these topologies is a fine solution from Libvirt side, in my > estimation. > > > Thanks, > > > DHB >
[PATCH] numa: forbid '-numa node, mem' for 5.1 and newer machine types
Deprecation period is run out and it's a time to flip the switch introduced by cd5ff8333a. Disable legacy option for new machine types (since 5.1) and amend documentation. '-numa node,memdev' shall be used instead of disabled option with new machine types. Signed-off-by: Igor Mammedov --- - rebased on top of current master - move compat mode from 4.2 to 5.0 CC: peter.mayd...@linaro.org CC: ehabk...@redhat.com CC: marcel.apfelb...@gmail.com CC: m...@redhat.com CC: pbonz...@redhat.com CC: r...@twiddle.net CC: da...@gibson.dropbear.id.au CC: libvir-list@redhat.com CC: qemu-...@nongnu.org CC: qemu-...@nongnu.org --- docs/system/deprecated.rst | 17 - hw/arm/virt.c | 2 +- hw/core/numa.c | 6 ++ hw/i386/pc.c | 1 - hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + hw/ppc/spapr.c | 2 +- qemu-options.hx| 9 + 8 files changed, 15 insertions(+), 24 deletions(-) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index f0061f94aa..57edc075c2 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -101,23 +101,6 @@ error in the future. The ``-realtime mlock=on|off`` argument has been replaced by the ``-overcommit mem-lock=on|off`` argument. -``-numa node,mem=``\ *size* (since 4.1) -''''''''''''''''''''''''''''''''''''''' - -The parameter ``mem`` of ``-numa node`` is used to assign a part of -guest RAM to a NUMA node. But when using it, it's impossible to manage specified -RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), -so guest end-ups with the fake NUMA configuration with suboptiomal performance. -However since 2014 there is an alternative way to assign RAM to a NUMA node -using parameter ``memdev``, which does the same as ``mem`` and adds -means to actualy manage node RAM on the host side. Use parameter ``memdev`` -with *memory-backend-ram* backend as an replacement for parameter ``mem`` -to achieve the same fake NUMA effect or a properly configured -*memory-backend-file* backend to actually benefit from NUMA configuration. -In future new machine versions will not accept the option but it will still -work with old machine types. User can check QAPI schema to see if the legacy -option is supported by looking at MachineInfo::numa-mem-supported property. - ``-numa`` node (without memory specified) (since 4.1) ''''''''''''''''''''''''''''''''''''''''''''''''''''' diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 37462a6f78..063d4703f7 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2262,7 +2262,6 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) hc->pre_plug = virt_machine_device_pre_plug_cb; hc->plug = virt_machine_device_plug_cb; hc->unplug_request = virt_machine_device_unplug_request_cb; -mc->numa_mem_supported = true; mc->nvdimm_supported = true; mc->auto_enable_numa_with_memhp = true; mc->default_ram_id = "mach-virt.ram"; @@ -2375,6 +2374,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 1) static void virt_machine_5_0_options(MachineClass *mc) { virt_machine_5_1_options(mc); +mc->numa_mem_supported = true; } DEFINE_VIRT_MACHINE(5, 0) diff --git a/hw/core/numa.c b/hw/core/numa.c index 316bc50d75..05be412e59 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -117,6 +117,12 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, } if (node->has_mem) { +if (!mc->numa_mem_supported) { +error_setg(errp, "Parameter -numa node,mem is not supported by this" + " machine type. Use -numa node,memdev instead"); +return; +} + numa_info[nodenr].node_mem = node->mem; if (!qtest_enabled()) { warn_report("Parameter -numa node,mem is deprecated," diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2128f3d6fe..a86136069c 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1960,7 +1960,6 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) hc->unplug = pc_machine_device_unplug_cb; mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE; mc->nvdimm_supported = true; -mc->numa_mem_supported = true; mc->default_ram_id = "pc.ram"; object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int", diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix
Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=
On Tue, 26 May 2020 17:31:09 +0200 Michal Privoznik wrote: > On 5/26/20 4:51 PM, Igor Mammedov wrote: > > On Mon, 25 May 2020 10:05:08 +0200 > > Michal Privoznik wrote: > > > >> > >> This is a problem. The domain XML that is provided can't be changed, > >> mostly because mgmt apps construct it on the fly and then just pass it > >> as a RO string to libvirt. While libvirt could create a separate cache, > >> there has to be a better way. > >> > >> I mean, I can add some more code that once the guest is running > >> preserves the mapping during migration. But that assumes a running QEMU. > >> When starting a domain from scratch, is it acceptable it vCPU topology > >> changes? I suspect it is not. > > I'm not sure I got you but > > vCPU topology isn't changnig but when starting QEMU, user has to map > > 'concrete vCPUs' to spencific numa nodes. The issue here is that > > to specify concrete vCPUs user needs to get layout from QEMU first > > as it's a function of target/machine/-smp and possibly cpu type. > > Assume the following config: 4 vCPUs (2 sockets, 2 cores, 1 thread > topology) and 2 NUMA nodes and the following assignment to NUMA: > > node 0: cpus=0-1 > node 1: cpus=2-3 > > With old libvirt & qemu (and assuming x86_64 - not EPYC), I assume the > following topology is going to be used: > > node 0: socket=0,core=0,thread=0 (vCPU0) socket=0,core=1,thread=0 (vCPU1) > node 1: socket=1,core=0,thread=0 (vCPU2) socket=1,core=1,thread=0 (vCPU3) > > Now, user upgrades libvirt & qemu but doesn't change the config. And on > a fresh new start (no migration), they might get a different topology: > > node 0: socket=0,core=0,thread=0 (vCPU0) socket=1,core=0,thread=0 (vCPU1) > node 1: socket=0,core=1,thread=0 (vCPU2) socket=1,core=1,thread=0 (vCPU3) that shouldn't happen at least for as long as machine version stays the same > (This is a very trivial example that I am intentionally making look bad, > but the thing is, there are some CPUs with very weird vCPU -> > socket/core/thread mappings). > > The problem here is that with this new version it is libvirt who > configured the vCPU -> NUMA mapping (using -numa cpu). Why so wrong? > Well it had no way to ask qemu how it used to be. Okay, so we add an > interface to QEMU (say -preconfig + query-hotpluggable-cpus) which will > do the mapping and keep it there indefinitely. But if the interface is > already there (and "always" will be), I don't see need for the extra > step (libvirt asking QEMU for the old mapping). with cpu_index users don't know what CPUs they assing where, and in some cases (spapr) it doesn't really maps into board supported CPU model well. We can add and keep cpu_index in query-hotpluggable-cpus to help with migration for old machine types from old CLI to the new one, but otherwise cpu_index would disapear from user visible inerface. I'd like to drop duplicate code supporting ambiguose '-numa node,cpus' (and not always properly working interface) and keep only single variant '-numa cpu=' to do numa mapping, which uses CPU's topology properties to describe CPUs, and unifies it with the way it's done with cpu hotplug. > The problem here is not how to assign vCPUs to NUMA nodes, the problem > is how to translate vCPU IDs to socket=,core=,thread=. if you are talking about libvirt's vCPU IDs, then it's separate issue as it's user facing API, I think it should not rely on cpu_index. Instead it should map vCPU IDs to ([socket,]core[,thread]) tuple or maybe drop notion of vCPU IDs and expose ([socket,]core[,thread]) to users if they ask for numa aware config. PS: I'm curious how libvirt currently implements numa mapping and how it's correlated with pinnig to host nodes? Does it have any sort of code to calculate topology based on cpu_index so it could properly assign vCPUs to nodes or all the pain of assigning vCPU IDs to nodes is on the user shoulders? > > that applies not only '-numa cpu' but also to -device cpufoo, > > that's why query-hotpluggable-cpus was introduced to let > > user get the list of possible CPUs (including topo properties needed to > > create them) for a given set of CLI options. > > > > If I recall right libvirt uses topo properies during cpu hotplug but > > treats it mainly as opaqueue info so it could feed it back to QEMU. > > > > > >>>> tries to avoid that as much as it can. > >>>> > >>>>> > >>>>> How to present it to libvirt user I'm not sure (give them that list > >>
Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=
On Mon, 25 May 2020 10:05:08 +0200 Michal Privoznik wrote: > On 5/22/20 7:18 PM, Igor Mammedov wrote: > > On Fri, 22 May 2020 18:28:31 +0200 > > Michal Privoznik wrote: > > > >> On 5/22/20 6:07 PM, Igor Mammedov wrote: > >>> On Fri, 22 May 2020 16:14:14 +0200 > >>> Michal Privoznik wrote: > >>> > >>>> QEMU is trying to obsolete -numa node,cpus= because that uses > >>>> ambiguous vCPU id to [socket, die, core, thread] mapping. The new > >>>> form is: > >>>> > >>>> -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T > >>>> > >>>> which is repeated for every vCPU and places it at [S, D, C, T] > >>>> into guest NUMA node N. > >>>> > >>>> While in general this is magic mapping, we can deal with it. > >>>> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology > >>>> is given then maxvcpus must be sockets * dies * cores * threads > >>>> (i.e. there are no 'holes'). > >>>> Secondly, if no topology is given then libvirt itself places each > >>>> vCPU into a different socket (basically, it fakes topology of: > >>>> [maxvcpus, 1, 1, 1]) > >>>> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs > >>>> onto topology, to make sure vCPUs don't start to move around. > >>>> > >>>> Note, migration from old to new cmd line works and therefore > >>>> doesn't need any special handling. > >>>> > >>>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085 > >>>> > >>>> Signed-off-by: Michal Privoznik > >>>> --- > >>>>src/qemu/qemu_command.c | 108 +- > >>>>.../hugepages-nvdimm.x86_64-latest.args | 4 +- > >>>>...memory-default-hugepage.x86_64-latest.args | 10 +- > >>>>.../memfd-memory-numa.x86_64-latest.args | 10 +- > >>>>...y-hotplug-nvdimm-access.x86_64-latest.args | 4 +- > >>>>...ry-hotplug-nvdimm-align.x86_64-latest.args | 4 +- > >>>>...ry-hotplug-nvdimm-label.x86_64-latest.args | 4 +- > >>>>...ory-hotplug-nvdimm-pmem.x86_64-latest.args | 4 +- > >>>>...ory-hotplug-nvdimm-ppc64.ppc64-latest.args | 4 +- > >>>>...hotplug-nvdimm-readonly.x86_64-latest.args | 4 +- > >>>>.../memory-hotplug-nvdimm.x86_64-latest.args | 4 +- > >>>>...vhost-user-fs-fd-memory.x86_64-latest.args | 4 +- > >>>>...vhost-user-fs-hugepages.x86_64-latest.args | 4 +- > >>>>...host-user-gpu-secondary.x86_64-latest.args | 3 +- > >>>>.../vhost-user-vga.x86_64-latest.args | 3 +- > >>>>15 files changed, 158 insertions(+), 16 deletions(-) > >>>> > >>>> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c > >>>> index 7d84fd8b5e..0de4fe4905 100644 > >>>> --- a/src/qemu/qemu_command.c > >>>> +++ b/src/qemu/qemu_command.c > >>>> @@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf, > >>>>} > >>>> > >>>> > >>>> +/** > >>>> + * qemuTranlsatevCPUID: > >>>> + * > >>>> + * For given vCPU @id and vCPU topology (@cpu) compute corresponding > >>>> + * @socket, @die, @core and @thread). This assumes linear topology, > >>>> + * that is every [socket, die, core, thread] combination is valid vCPU > >>>> + * ID and there are no 'holes'. This is ensured by > >>>> + * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is > >>>> + * set. > >>> I wouldn't make this assumption, each machine can have (and has) it's own > >>> layout, > >>> and now it's not hard to change that per machine version if necessary. > >>> > >>> I'd suppose one could pull the list of possible CPUs from QEMU started > >>> in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS > >>> and then continue to configure numa with QMP commands using provided > >>> CPUs layout. > >> > >> Continue where? At the 'preconfig mode' the guest is already started, > >> isn't it? Are you suggesting that libvirt starts a dummy QEMU process
Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=
On Fri, 22 May 2020 18:28:31 +0200 Michal Privoznik wrote: > On 5/22/20 6:07 PM, Igor Mammedov wrote: > > On Fri, 22 May 2020 16:14:14 +0200 > > Michal Privoznik wrote: > > > >> QEMU is trying to obsolete -numa node,cpus= because that uses > >> ambiguous vCPU id to [socket, die, core, thread] mapping. The new > >> form is: > >> > >>-numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T > >> > >> which is repeated for every vCPU and places it at [S, D, C, T] > >> into guest NUMA node N. > >> > >> While in general this is magic mapping, we can deal with it. > >> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology > >> is given then maxvcpus must be sockets * dies * cores * threads > >> (i.e. there are no 'holes'). > >> Secondly, if no topology is given then libvirt itself places each > >> vCPU into a different socket (basically, it fakes topology of: > >> [maxvcpus, 1, 1, 1]) > >> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs > >> onto topology, to make sure vCPUs don't start to move around. > >> > >> Note, migration from old to new cmd line works and therefore > >> doesn't need any special handling. > >> > >> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085 > >> > >> Signed-off-by: Michal Privoznik > >> --- > >> src/qemu/qemu_command.c | 108 +- > >> .../hugepages-nvdimm.x86_64-latest.args | 4 +- > >> ...memory-default-hugepage.x86_64-latest.args | 10 +- > >> .../memfd-memory-numa.x86_64-latest.args | 10 +- > >> ...y-hotplug-nvdimm-access.x86_64-latest.args | 4 +- > >> ...ry-hotplug-nvdimm-align.x86_64-latest.args | 4 +- > >> ...ry-hotplug-nvdimm-label.x86_64-latest.args | 4 +- > >> ...ory-hotplug-nvdimm-pmem.x86_64-latest.args | 4 +- > >> ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args | 4 +- > >> ...hotplug-nvdimm-readonly.x86_64-latest.args | 4 +- > >> .../memory-hotplug-nvdimm.x86_64-latest.args | 4 +- > >> ...vhost-user-fs-fd-memory.x86_64-latest.args | 4 +- > >> ...vhost-user-fs-hugepages.x86_64-latest.args | 4 +- > >> ...host-user-gpu-secondary.x86_64-latest.args | 3 +- > >> .../vhost-user-vga.x86_64-latest.args | 3 +- > >> 15 files changed, 158 insertions(+), 16 deletions(-) > >> > >> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c > >> index 7d84fd8b5e..0de4fe4905 100644 > >> --- a/src/qemu/qemu_command.c > >> +++ b/src/qemu/qemu_command.c > >> @@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf, > >> } > >> > >> > >> +/** > >> + * qemuTranlsatevCPUID: > >> + * > >> + * For given vCPU @id and vCPU topology (@cpu) compute corresponding > >> + * @socket, @die, @core and @thread). This assumes linear topology, > >> + * that is every [socket, die, core, thread] combination is valid vCPU > >> + * ID and there are no 'holes'. This is ensured by > >> + * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is > >> + * set. > > I wouldn't make this assumption, each machine can have (and has) it's own > > layout, > > and now it's not hard to change that per machine version if necessary. > > > > I'd suppose one could pull the list of possible CPUs from QEMU started > > in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS > > and then continue to configure numa with QMP commands using provided > > CPUs layout. > > Continue where? At the 'preconfig mode' the guest is already started, > isn't it? Are you suggesting that libvirt starts a dummy QEMU process, > fetches the CPU topology from it an then starts if for real? Libvirt QEMU started but it's very far from starting guest, at that time it's possible configure numa mapping at runtime and continue to -S or running state without restarting QEMU. For the follow up starts, used topology and numa options can be cached and reused at CLI time as long as machine/-smp combination stays the same. > tries to avoid that as much as it can. > > > > > How to present it to libvirt user I'm not sure (give them that list perhaps > > and let select from it???) > > This is what I am trying to figure out in the cover letter. Maybe we > need to let users configure the topology (well, vCPU id to [socket, die, &
Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=
On Fri, 22 May 2020 16:14:14 +0200 Michal Privoznik wrote: > QEMU is trying to obsolete -numa node,cpus= because that uses > ambiguous vCPU id to [socket, die, core, thread] mapping. The new > form is: > > -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T > > which is repeated for every vCPU and places it at [S, D, C, T] > into guest NUMA node N. > > While in general this is magic mapping, we can deal with it. > Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology > is given then maxvcpus must be sockets * dies * cores * threads > (i.e. there are no 'holes'). > Secondly, if no topology is given then libvirt itself places each > vCPU into a different socket (basically, it fakes topology of: > [maxvcpus, 1, 1, 1]) > Thirdly, we can copy whatever QEMU is doing when mapping vCPUs > onto topology, to make sure vCPUs don't start to move around. > > Note, migration from old to new cmd line works and therefore > doesn't need any special handling. > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085 > > Signed-off-by: Michal Privoznik > --- > src/qemu/qemu_command.c | 108 +- > .../hugepages-nvdimm.x86_64-latest.args | 4 +- > ...memory-default-hugepage.x86_64-latest.args | 10 +- > .../memfd-memory-numa.x86_64-latest.args | 10 +- > ...y-hotplug-nvdimm-access.x86_64-latest.args | 4 +- > ...ry-hotplug-nvdimm-align.x86_64-latest.args | 4 +- > ...ry-hotplug-nvdimm-label.x86_64-latest.args | 4 +- > ...ory-hotplug-nvdimm-pmem.x86_64-latest.args | 4 +- > ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args | 4 +- > ...hotplug-nvdimm-readonly.x86_64-latest.args | 4 +- > .../memory-hotplug-nvdimm.x86_64-latest.args | 4 +- > ...vhost-user-fs-fd-memory.x86_64-latest.args | 4 +- > ...vhost-user-fs-hugepages.x86_64-latest.args | 4 +- > ...host-user-gpu-secondary.x86_64-latest.args | 3 +- > .../vhost-user-vga.x86_64-latest.args | 3 +- > 15 files changed, 158 insertions(+), 16 deletions(-) > > diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c > index 7d84fd8b5e..0de4fe4905 100644 > --- a/src/qemu/qemu_command.c > +++ b/src/qemu/qemu_command.c > @@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf, > } > > > +/** > + * qemuTranlsatevCPUID: > + * > + * For given vCPU @id and vCPU topology (@cpu) compute corresponding > + * @socket, @die, @core and @thread). This assumes linear topology, > + * that is every [socket, die, core, thread] combination is valid vCPU > + * ID and there are no 'holes'. This is ensured by > + * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is > + * set. I wouldn't make this assumption, each machine can have (and has) it's own layout, and now it's not hard to change that per machine version if necessary. I'd suppose one could pull the list of possible CPUs from QEMU started in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS and then continue to configure numa with QMP commands using provided CPUs layout. How to present it to libvirt user I'm not sure (give them that list perhaps and let select from it???) But it's irrelevant, to the patch, magical IDs for socket/core/...whatever should not be generated by libvirt anymore, but rather taken from QEMU for given machine + -smp combination. CCing Peter, as I vaguely recall him working on this issue (preconfig + numa over QMP) > + * Moreover, if @diesSupported is false (QEMU lacks > + * QEMU_CAPS_SMP_DIES) then @die is set to zero and @socket is > + * computed without taking numbed of dies into account. > + * > + * The algorithm is shamelessly copied over from QEMU's > + * x86_topo_ids_from_idx() and its history (before introducing dies). > + */ > +static void > +qemuTranlsatevCPUID(unsigned int id, > +bool diesSupported, > +virCPUDefPtr cpu, > +unsigned int *socket, > +unsigned int *die, > +unsigned int *core, > +unsigned int *thread) > +{ > +if (cpu && cpu->sockets) { > +*thread = id % cpu->threads; > +*core = id / cpu->threads % cpu->cores; > +if (diesSupported) { > +*die = id / (cpu->cores * cpu->threads) % cpu->dies; > +*socket = id / (cpu->dies * cpu->cores * cpu->threads); > +} else { > +*die = 0; > +*socket = id / (cpu->cores * cpu->threads) % cpu->sockets; > +} > +} else { > +/* If no topology was provided, then qemuBuildSmpCommandLine() > + * puts all vCPUs into a separate socket. */ > +*thread = 0; > +*core = 0; > +*die = 0; > +*socket = id; > +} > +} > + > + > +static void > +qemuBuildNumaNewCPUs(virCommandPtr cmd, > + virCPUDefPtr cpu, > + virBitmapPtr cpumask, > + size_t nodeid, > + virQEMUCapsPtr qemuCap
Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types
On Thu, 16 Jan 2020 13:06:28 + Daniel P. Berrangé wrote: > On Thu, Jan 16, 2020 at 01:37:03PM +0100, Igor Mammedov wrote: > > On Thu, 16 Jan 2020 11:42:09 +0100 > > Michal Privoznik wrote: > > > > > On 1/15/20 5:52 PM, Igor Mammedov wrote: > > > > On Wed, 15 Jan 2020 16:34:53 +0100 > > > > Peter Krempa wrote: > > > > > > > >> On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote: > > > >>> Deprecation period is ran out and it's a time to flip the switch > > > >>> introduced by cd5ff8333a. > > > >>> Disable legacy option for new machine types and amend documentation. > > > >>> > > > >>> Signed-off-by: Igor Mammedov > > > >>> --- > > > >>> CC: peter.mayd...@linaro.org > > > >>> CC: ehabk...@redhat.com > > > >>> CC: marcel.apfelb...@gmail.com > > > >>> CC: m...@redhat.com > > > >>> CC: pbonz...@redhat.com > > > >>> CC: r...@twiddle.net > > > >>> CC: da...@gibson.dropbear.id.au > > > >>> CC: libvir-list@redhat.com > > > >>> CC: qemu-...@nongnu.org > > > >>> CC: qemu-...@nongnu.org > > > >>> --- > > > >>> hw/arm/virt.c| 2 +- > > > >>> hw/core/numa.c | 6 ++ > > > >>> hw/i386/pc.c | 1 - > > > >>> hw/i386/pc_piix.c| 1 + > > > >>> hw/i386/pc_q35.c | 1 + > > > >>> hw/ppc/spapr.c | 2 +- > > > >>> qemu-deprecated.texi | 16 > > > >>> qemu-options.hx | 8 > > > >>> 8 files changed, 14 insertions(+), 23 deletions(-) > > > >> > > > >> I'm afraid nobody bothered to fix it yet: > > > >> > > > >> https://bugzilla.redhat.com/show_bug.cgi?id=1783355 > > > > > > > > It's time to start working on it :) > > > > (looks like just deprecating stuff isn't sufficient motivation, > > > > maybe actual switch flipping would work out better) > > > > > > > > > > So how was the upgrade from older to newer version resolved? I mean, if > > > the old qemu used -numa node,mem=XXX and it is migrated to a host with > > > newer qemu, the cmd line can't be switched to -numa node,memdev=node0, > > > can it? I'm asking because I've just started working on this. > > > > see commit cd5ff8333a3c87 for detailed info. > > Short answer is it's not really resolved [*], > > -numa node,mem will keep working on newer QEMU but only for old machine > > types > > new machine types will accept only -numa node,memdev. > > > > One can check if "mem=' is supported by using QAPI query-machines > > and checking numa-mem-supported field. That field is flipped to false > > for 5.0 and later machine types in this patch. > > Since libvirt droppped the ball here, can we postpone this change > to machine types until a later release. Looks like we have to at this point. We can do this for [82-86/86] patches which are mostly numa related changes. The rest could go in this release as it is in-depended of numa, it mainly introduces memdev backend for main RAM and consolidates twisted main RAM allocation logic. > > Regards, > Daniel
Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types
On Thu, 16 Jan 2020 14:03:12 +0100 Michal Privoznik wrote: > On 1/16/20 1:37 PM, Igor Mammedov wrote: > > On Thu, 16 Jan 2020 11:42:09 +0100 > > Michal Privoznik wrote: > > > >> On 1/15/20 5:52 PM, Igor Mammedov wrote: > >>> On Wed, 15 Jan 2020 16:34:53 +0100 > >>> Peter Krempa wrote: > >>> > >>>> On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote: > >>>>> Deprecation period is ran out and it's a time to flip the switch > >>>>> introduced by cd5ff8333a. > >>>>> Disable legacy option for new machine types and amend documentation. > >>>>> > >>>>> Signed-off-by: Igor Mammedov > >>>>> --- > >>>>> CC: peter.mayd...@linaro.org > >>>>> CC: ehabk...@redhat.com > >>>>> CC: marcel.apfelb...@gmail.com > >>>>> CC: m...@redhat.com > >>>>> CC: pbonz...@redhat.com > >>>>> CC: r...@twiddle.net > >>>>> CC: da...@gibson.dropbear.id.au > >>>>> CC: libvir-list@redhat.com > >>>>> CC: qemu-...@nongnu.org > >>>>> CC: qemu-...@nongnu.org > >>>>> --- > >>>>>hw/arm/virt.c| 2 +- > >>>>>hw/core/numa.c | 6 ++ > >>>>>hw/i386/pc.c | 1 - > >>>>>hw/i386/pc_piix.c| 1 + > >>>>>hw/i386/pc_q35.c | 1 + > >>>>>hw/ppc/spapr.c | 2 +- > >>>>>qemu-deprecated.texi | 16 > >>>>>qemu-options.hx | 8 > >>>>>8 files changed, 14 insertions(+), 23 deletions(-) > >>>> > >>>> I'm afraid nobody bothered to fix it yet: > >>>> > >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1783355 > >>> > >>> It's time to start working on it :) > >>> (looks like just deprecating stuff isn't sufficient motivation, > >>> maybe actual switch flipping would work out better) > >>> > >> > >> So how was the upgrade from older to newer version resolved? I mean, if > >> the old qemu used -numa node,mem=XXX and it is migrated to a host with > >> newer qemu, the cmd line can't be switched to -numa node,memdev=node0, > >> can it? I'm asking because I've just started working on this. > > > > see commit cd5ff8333a3c87 for detailed info. > > Short answer is it's not really resolved [*], > > -numa node,mem will keep working on newer QEMU but only for old machine > > types > > new machine types will accept only -numa node,memdev. > > > > One can check if "mem=' is supported by using QAPI query-machines > > and checking numa-mem-supported field. That field is flipped to false > > for 5.0 and later machine types in this patch. > > Alright, so what we can do is the following: > > 1) For new machine types (pc-5.0/q35-5.0 and newer) use memdev= always. it's not only x86, it's for all machines that support numa hence numa-mem-supported was introduced to make it easier for libvirt to figure out when to use which syntax. The plan was to release libvirt with support for numa-mem-supported and then when newer QEMU forbids 'mem=' it change will be transparent for relatively fresh livirt. Whether it still does make sense though. We could go with your suggestion in which case libvirt unilaterally switches to using only 'memdev' for 5.0 machine types and then later (5.1..) we release QEMU that enforces it. In this case we can axe numa-mem-supported (I'd volunteer) to avoid supporting yet another ABI/smart logic where your way could be sufficient. Daniel, what's your take on Michal's approach? > 2) For older machine types, we are stuck with mem= until qemu is capable > of migrating from mem= to memdev= > > I think this is a safe thing to do since migrating from one version of a > machine type to another is not supported (since it can change guest > ABI). And we will see how much 2) bothers us. Does this sound reasonable?\ > > Michal > >
Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types
On Thu, 16 Jan 2020 11:42:09 +0100 Michal Privoznik wrote: > On 1/15/20 5:52 PM, Igor Mammedov wrote: > > On Wed, 15 Jan 2020 16:34:53 +0100 > > Peter Krempa wrote: > > > >> On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote: > >>> Deprecation period is ran out and it's a time to flip the switch > >>> introduced by cd5ff8333a. > >>> Disable legacy option for new machine types and amend documentation. > >>> > >>> Signed-off-by: Igor Mammedov > >>> --- > >>> CC: peter.mayd...@linaro.org > >>> CC: ehabk...@redhat.com > >>> CC: marcel.apfelb...@gmail.com > >>> CC: m...@redhat.com > >>> CC: pbonz...@redhat.com > >>> CC: r...@twiddle.net > >>> CC: da...@gibson.dropbear.id.au > >>> CC: libvir-list@redhat.com > >>> CC: qemu-...@nongnu.org > >>> CC: qemu-...@nongnu.org > >>> --- > >>> hw/arm/virt.c| 2 +- > >>> hw/core/numa.c | 6 ++ > >>> hw/i386/pc.c | 1 - > >>> hw/i386/pc_piix.c| 1 + > >>> hw/i386/pc_q35.c | 1 + > >>> hw/ppc/spapr.c | 2 +- > >>> qemu-deprecated.texi | 16 > >>> qemu-options.hx | 8 > >>> 8 files changed, 14 insertions(+), 23 deletions(-) > >> > >> I'm afraid nobody bothered to fix it yet: > >> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1783355 > > > > It's time to start working on it :) > > (looks like just deprecating stuff isn't sufficient motivation, > > maybe actual switch flipping would work out better) > > > > So how was the upgrade from older to newer version resolved? I mean, if > the old qemu used -numa node,mem=XXX and it is migrated to a host with > newer qemu, the cmd line can't be switched to -numa node,memdev=node0, > can it? I'm asking because I've just started working on this. see commit cd5ff8333a3c87 for detailed info. Short answer is it's not really resolved [*], -numa node,mem will keep working on newer QEMU but only for old machine types new machine types will accept only -numa node,memdev. One can check if "mem=' is supported by using QAPI query-machines and checking numa-mem-supported field. That field is flipped to false for 5.0 and later machine types in this patch. *) I might give another try to removing 'mem' completely in migration compatible manner but that's well beyond the scope of this series So far I hasn't been able to convince myself that previous attempts to do it were absolutely correct for all corner cases that are there. > Michal
Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types
On Wed, 15 Jan 2020 16:34:53 +0100 Peter Krempa wrote: > On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote: > > Deprecation period is ran out and it's a time to flip the switch > > introduced by cd5ff8333a. > > Disable legacy option for new machine types and amend documentation. > > > > Signed-off-by: Igor Mammedov > > --- > > CC: peter.mayd...@linaro.org > > CC: ehabk...@redhat.com > > CC: marcel.apfelb...@gmail.com > > CC: m...@redhat.com > > CC: pbonz...@redhat.com > > CC: r...@twiddle.net > > CC: da...@gibson.dropbear.id.au > > CC: libvir-list@redhat.com > > CC: qemu-...@nongnu.org > > CC: qemu-...@nongnu.org > > --- > > hw/arm/virt.c| 2 +- > > hw/core/numa.c | 6 ++ > > hw/i386/pc.c | 1 - > > hw/i386/pc_piix.c| 1 + > > hw/i386/pc_q35.c | 1 + > > hw/ppc/spapr.c | 2 +- > > qemu-deprecated.texi | 16 > > qemu-options.hx | 8 > > 8 files changed, 14 insertions(+), 23 deletions(-) > > I'm afraid nobody bothered to fix it yet: > > https://bugzilla.redhat.com/show_bug.cgi?id=1783355 It's time to start working on it :) (looks like just deprecating stuff isn't sufficient motivation, maybe actual switch flipping would work out better) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v2 86/86] numa: remove deprecated implicit RAM distribution between nodes
Feature has been deprecated since 4.1 (4bb4a273), remove it. As result if RAM distribution wasn't specified explicitly, the machine won't start and CLI should be changed to explicitly assign RAM to nodes using options: -node node,memdev (5.0 and newer machine types) -node node,mem (4.2 and older machine types) It's recommended to use "memdev" variant for new virtual machines and use "mem" only when it's necessary to migrate already existing virtual machine started with implicit RAM distribution. Signed-off-by: Igor Mammedov --- CC: ehabk...@redhat.com CC: marcel.apfelb...@gmail.com CC: m...@redhat.com CC: pbonz...@redhat.com CC: da...@gibson.dropbear.id.au CC: libvir-list@redhat.com CC: qemu-...@nongnu.org CC: r...@twiddle.net --- include/hw/boards.h | 3 --- include/sysemu/numa.h | 4 hw/core/machine.c | 6 - hw/core/numa.c| 61 +-- hw/i386/pc_piix.c | 1 - hw/i386/pc_q35.c | 1 - hw/ppc/spapr.c| 7 -- qemu-deprecated.texi | 8 --- qemu-options.hx | 16 +++--- 9 files changed, 13 insertions(+), 94 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index 7f09bc9..916bb50 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -192,12 +192,9 @@ struct MachineClass { int minimum_page_bits; bool has_hotpluggable_cpus; bool ignore_memory_transaction_failures; -int numa_mem_align_shift; const char **valid_cpu_types; strList *allowed_dynamic_sysbus_devices; bool auto_enable_numa_with_memhp; -void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, - int nb_nodes, ram_addr_t size); bool ignore_boot_device_suffixes; bool smbus_no_migration_support; bool nvdimm_supported; diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index ad58ee8..4173ef2 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -106,10 +106,6 @@ void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node, void numa_complete_configuration(MachineState *ms); void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms); extern QemuOptsList qemu_numa_opts; -void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, - int nb_nodes, ram_addr_t size); -void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, - int nb_nodes, ram_addr_t size); void numa_cpu_pre_plug(const struct CPUArchId *slot, DeviceState *dev, Error **errp); bool numa_uses_legacy_mem(void); diff --git a/hw/core/machine.c b/hw/core/machine.c index d8fa45c..0862f45 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -747,12 +747,6 @@ static void machine_class_init(ObjectClass *oc, void *data) mc->rom_file_has_mr = true; mc->smp_parse = smp_parse; -/* numa node memory size aligned on 8MB by default. - * On Linux, each node's border has to be 8MB aligned - */ -mc->numa_mem_align_shift = 23; -mc->numa_auto_assign_ram = numa_default_auto_assign_ram; - object_class_property_add_str(oc, "kernel", machine_get_kernel, machine_set_kernel, &error_abort); object_class_property_set_description(oc, "kernel", diff --git a/hw/core/numa.c b/hw/core/numa.c index 47d5ea1..591e62a 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -627,42 +627,6 @@ static void complete_init_numa_distance(MachineState *ms) } } -void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, - int nb_nodes, ram_addr_t size) -{ -int i; -uint64_t usedmem = 0; - -/* Align each node according to the alignment - * requirements of the machine class - */ - -for (i = 0; i < nb_nodes - 1; i++) { -nodes[i].node_mem = (size / nb_nodes) & -~((1 << mc->numa_mem_align_shift) - 1); -usedmem += nodes[i].node_mem; -} -nodes[i].node_mem = size - usedmem; -} - -void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, - int nb_nodes, ram_addr_t size) -{ -int i; -uint64_t usedmem = 0, node_mem; -uint64_t granularity = size / nb_nodes; -uint64_t propagate = 0; - -for (i = 0; i < nb_nodes - 1; i++) { -node_mem = (granularity + propagate) & - ~((1 << mc->numa_mem_align_shift) - 1); -propagate = granularity + propagate - node_mem; -nodes[i].node_mem = node_mem; -usedmem += node_mem; -} -nodes[i].node_mem = size - usedmem; -} - static void numa_init_memdev_container(MachineState *ms, MemoryRegion *ram) { int i; @@ -732,30 +696,15 @@ void numa_complete_configuration(MachineState *ms) ms->numa_state->num_nodes = MAX_NODES;
[libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types
Deprecation period is ran out and it's a time to flip the switch introduced by cd5ff8333a. Disable legacy option for new machine types and amend documentation. Signed-off-by: Igor Mammedov --- CC: peter.mayd...@linaro.org CC: ehabk...@redhat.com CC: marcel.apfelb...@gmail.com CC: m...@redhat.com CC: pbonz...@redhat.com CC: r...@twiddle.net CC: da...@gibson.dropbear.id.au CC: libvir-list@redhat.com CC: qemu-...@nongnu.org CC: qemu-...@nongnu.org --- hw/arm/virt.c| 2 +- hw/core/numa.c | 6 ++ hw/i386/pc.c | 1 - hw/i386/pc_piix.c| 1 + hw/i386/pc_q35.c | 1 + hw/ppc/spapr.c | 2 +- qemu-deprecated.texi | 16 qemu-options.hx | 8 8 files changed, 14 insertions(+), 23 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e2fbca3..49de0d8 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2049,7 +2049,6 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) hc->pre_plug = virt_machine_device_pre_plug_cb; hc->plug = virt_machine_device_plug_cb; hc->unplug_request = virt_machine_device_unplug_request_cb; -mc->numa_mem_supported = true; mc->auto_enable_numa_with_memhp = true; mc->default_ram_id = "mach-virt.ram"; } @@ -2153,6 +2152,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 0) static void virt_machine_4_2_options(MachineClass *mc) { compat_props_add(mc->compat_props, hw_compat_4_2, hw_compat_4_2_len); +mc->numa_mem_supported = true; } DEFINE_VIRT_MACHINE(4, 2) diff --git a/hw/core/numa.c b/hw/core/numa.c index 0970a30..3177066 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -117,6 +117,12 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, } if (node->has_mem) { +if (!mc->numa_mem_supported) { +error_setg(errp, "Parameter -numa node,mem is not supported by this" + " machine type. Use -numa node,memdev instead"); +return; +} + numa_info[nodenr].node_mem = node->mem; if (!qtest_enabled()) { warn_report("Parameter -numa node,mem is deprecated," diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 21b8290..fa8d024 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1947,7 +1947,6 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) hc->unplug = pc_machine_device_unplug_cb; mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE; mc->nvdimm_supported = true; -mc->numa_mem_supported = true; mc->default_ram_id = "pc.ram"; object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int", diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index fa12203..0a9b9e0 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -435,6 +435,7 @@ static void pc_i440fx_4_2_machine_options(MachineClass *m) pc_i440fx_5_0_machine_options(m); m->alias = NULL; m->is_default = 0; +m->numa_mem_supported = true; compat_props_add(m->compat_props, hw_compat_4_2, hw_compat_4_2_len); compat_props_add(m->compat_props, pc_compat_4_2, pc_compat_4_2_len); } diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 84cf925..4d6e2be 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -363,6 +363,7 @@ static void pc_q35_4_2_machine_options(MachineClass *m) { pc_q35_5_0_machine_options(m); m->alias = NULL; +m->numa_mem_supported = true; compat_props_add(m->compat_props, hw_compat_4_2, hw_compat_4_2_len); compat_props_add(m->compat_props, pc_compat_4_2, pc_compat_4_2_len); } diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index bcbe1f1..2686b73 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4383,7 +4383,6 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) * in which LMBs are represented and hot-added */ mc->numa_mem_align_shift = 28; -mc->numa_mem_supported = true; mc->auto_enable_numa = true; smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; @@ -4465,6 +4464,7 @@ static void spapr_machine_4_2_class_options(MachineClass *mc) { spapr_machine_5_0_class_options(mc); compat_props_add(mc->compat_props, hw_compat_4_2, hw_compat_4_2_len); +mc->numa_mem_supported = true; } DEFINE_SPAPR_MACHINE(4_2, "4.2", false); diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 982af95..17a0e1d 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -89,22 +89,6 @@ error in the future. The @code{-realtime mlock=on|off} argument has been replaced by the @code{-overcommit mem-lock=on|off} argument. -@subsection -numa node,mem=@var{size} (since 4.1) - -The parameter @option{mem} of @option{-numa node} is used to assign a part of -guest RAM to a NUMA node. But when using it, it's impossible to manage specified -RAM chunk on the host side (like bind it to
Re: [libvirt] [PATCH 2/2] Add -mem-shared option
On Tue, 10 Dec 2019 11:34:32 +0100 Markus Armbruster wrote: > Eduardo Habkost writes: > > > +Markus > > > > On Tue, Dec 03, 2019 at 03:43:03PM +0100, Igor Mammedov wrote: > >> On Tue, 3 Dec 2019 09:56:15 +0100 > >> Thomas Huth wrote: > >> > >> > On 02/12/2019 22.00, Eduardo Habkost wrote: > >> > > On Mon, Dec 02, 2019 at 08:39:48AM +0100, Igor Mammedov wrote: > >> > >> On Fri, 29 Nov 2019 18:46:12 +0100 > >> > >> Paolo Bonzini wrote: > >> > >> > >> > >>> On 29/11/19 13:16, Igor Mammedov wrote: > >> > >>>> As for "-m", I'd make it just an alias that translates > >> > >>>> -m/mem-path/mem-prealloc > >> > >>> > >> > >>> I think we should just deprecate -mem-path/-mem-prealloc in 5.0. > >> > >>> CCing > >> > >>> Thomas as mister deprecation. :) > >> > >> > >> > >> I'll add that to my series > >> > > > >> > > Considering that the plan is to eventually reimplement those > >> > > options as syntactic sugar for memory backend options (hopefully > >> > > in less than 2 QEMU releases), what's the point of deprecating > >> > > them? > >> > > >> > Well, it depends on the "classification" [1] of the parameter... > >> > > >> > Let's ask: What's the main purpose of the option? > >> > > >> > Is it easier to use than the "full" option, and thus likely to be used > >> > by a lot of people who run QEMU directly from the CLI? In that case it > >> > should stay as "convenience option" and not be deprecated. > >> > > >> > Or is the option merely there to give the upper layers like libvirt or > >> > some few users and their scripts some more grace period to adapt their > >> > code, but we all agree that the options are rather ugly and should > >> > finally go away? Then it's rather a "legacy option" and the deprecation > >> > process is the right way to go. Our QEMU interface is still way > >> > overcrowded, we should try to keep it as clean as possible. > >> > >> After switching to memdev for main RAM, users could use relatively > >> short global options > >> -global memory-backend.prealloc|share=on > >> and > >> -global memory-backend-file.mem-path=X|prealloc|share=on > >> > >> instead of us adding and maintaining slightly shorter > >> -mem-shared/-mem-path/-mem-prealloc > > > > Global properties are a convenient way to expose knobs through > > the command line with little effort, but we have no documentation > > on which QOM properties are really supposed to be touched by > > users using -global. > > > > Unless we fix the lack of documentation, I'd prefer to have > > syntactic sugar translated to -global instead of recommending > > direct usage of -global. > > Fair point. > > I'd take QOM property documentation over still more sugar. > > Sometimes, the practical way to make simple things simple is sugar. I > can accept that. This doesn't look like such a case, though. I can document concrete globals as replacement at the place -mem-path/-mem-prealloc are documented during deprecation and then in 2 releases we will just drop legacy syntax and keep only globals over there. (eventually it will spread various globals over man page, which I don't like but we probably should start somwhere and consolidate later if globals in man page become normal practice.) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 2/2] Add -mem-shared option
On Tue, 3 Dec 2019 09:56:15 +0100 Thomas Huth wrote: > On 02/12/2019 22.00, Eduardo Habkost wrote: > > On Mon, Dec 02, 2019 at 08:39:48AM +0100, Igor Mammedov wrote: > >> On Fri, 29 Nov 2019 18:46:12 +0100 > >> Paolo Bonzini wrote: > >> > >>> On 29/11/19 13:16, Igor Mammedov wrote: > >>>> As for "-m", I'd make it just an alias that translates > >>>> -m/mem-path/mem-prealloc > >>> > >>> I think we should just deprecate -mem-path/-mem-prealloc in 5.0. CCing > >>> Thomas as mister deprecation. :) > >> > >> I'll add that to my series > > > > Considering that the plan is to eventually reimplement those > > options as syntactic sugar for memory backend options (hopefully > > in less than 2 QEMU releases), what's the point of deprecating > > them? > > Well, it depends on the "classification" [1] of the parameter... > > Let's ask: What's the main purpose of the option? > > Is it easier to use than the "full" option, and thus likely to be used > by a lot of people who run QEMU directly from the CLI? In that case it > should stay as "convenience option" and not be deprecated. > > Or is the option merely there to give the upper layers like libvirt or > some few users and their scripts some more grace period to adapt their > code, but we all agree that the options are rather ugly and should > finally go away? Then it's rather a "legacy option" and the deprecation > process is the right way to go. Our QEMU interface is still way > overcrowded, we should try to keep it as clean as possible. After switching to memdev for main RAM, users could use relatively short global options -global memory-backend.prealloc|share=on and -global memory-backend-file.mem-path=X|prealloc|share=on instead of us adding and maintaining slightly shorter -mem-shared/-mem-path/-mem-prealloc > Thomas > > > [1] Using the terms from: > https://www.youtube.com/watch?v=Oscjpkns7tM&t=8m -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v2] deprecate -mem-path fallback to anonymous RAM
Fallback might affect guest or worse whole host performance or functionality if backing file were used to share guest RAM with another process. Patch deprecates fallback so that we could remove it in future and ensure that QEMU will provide expected behavior and fail if it can't use user provided backing file. Signed-off-by: Igor Mammedov --- v2: * improve text language (Markus Armbruster ) numa.c | 6 -- qemu-deprecated.texi | 9 + 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/numa.c b/numa.c index 91a29138a2..c15e53e92d 100644 --- a/numa.c +++ b/numa.c @@ -494,8 +494,10 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner, if (mem_prealloc) { exit(1); } -error_report("falling back to regular RAM allocation."); - +warn_report("falling back to regular RAM allocation"); +error_printf("This is deprecated. Make sure that -mem-path " + " specified path has sufficient resources to allocate" + " -m specified RAM amount or QEMU will fail to start"); /* Legacy behavior: if allocation failed, fall back to * regular RAM allocation. */ diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 2fe9b72121..1b7f3b10dc 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -112,6 +112,15 @@ QEMU using implicit generic or board specific splitting rule. Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if it's supported by used machine type) to define mapping explictly instead. +@subsection -mem-path fallback to RAM (since 4.1) +Currently if guest RAM allocation from file pointed by @option{mem-path} +fails, QEMU falls back to allocating from RAM, which might result +in unpredictable behavior since the backing file specified by the user +is ignored. In the future, users will be responsible for making sure +the backing storage specified with @option{-mem-path} can actually provide +the guest RAM configured with @option{-m} and fail to start up if RAM allocation +is unsuccessful. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) -- 2.18.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH] deprecate -mem-path fallback to anonymous RAM
On Mon, 24 Jun 2019 16:01:49 +0200 Markus Armbruster wrote: > Igor Mammedov writes: > > > On Mon, 24 Jun 2019 10:17:33 +0200 > > Markus Armbruster wrote: > > > >> Igor Mammedov writes: > >> > >> > Fallback might affect guest or worse whole host performance > >> > or functionality if backing file were used to share guest RAM > >> > with another process. > >> > > >> > Patch deprecates fallback so that we could remove it in future > >> > and ensure that QEMU will provide expected behavior and fail if > >> > it can't use user provided backing file. > >> > > >> > Signed-off-by: Igor Mammedov > >> > --- > >> > PS: > >> > Patch is written on top of > >> > [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory > >> > distribution > >> > to avoid conflicts in qemu-deprecated.texi > >> > > >> > numa.c | 4 ++-- > >> > qemu-deprecated.texi | 8 > >> > 2 files changed, 10 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/numa.c b/numa.c > >> > index 91a29138a2..53d67b8ad9 100644 > >> > --- a/numa.c > >> > +++ b/numa.c > >> > @@ -494,8 +494,8 @@ static void > >> > allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner, > >> > if (mem_prealloc) { > >> > exit(1); > >> > } > >> > -error_report("falling back to regular RAM allocation."); > >> > - > >> > +warn_report("falling back to regular RAM allocation. " > >> > +"Fallback to RAM allocation is deprecated."); > >> > > >> > >> Can we give the user clues on how to avoid the deprecated fallback? > > > > I've intentionally left it out for a lack of clear enough advise. > > Something like: > > "Make sure that host has resources to map file pointed by -mem-path" > > would be pretty useless. > > I see. > > > I think describing how host should be configured in various ways > > depending on type of backing storage is well out of scope of any > > QEMU documentation. But if you have an idea to what to put there > > (or what to put in deprecation doc and refer to from here), > > I'll add it on respin. > > > >> Warning message nitpick: the message should be a single phrase, with no > >> newline or trailing punctuation. Suggest something like > >> > >>warn_report("falling back to regular RAM allocation"); > >>error_printf("This is deprecated. >> "to do goes here>\n"); > >> > >> > /* Legacy behavior: if allocation failed, fall back to > >> > * regular RAM allocation. > >> > */ > >> > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi > >> > index 2fe9b72121..2193705644 100644 > >> > --- a/qemu-deprecated.texi > >> > +++ b/qemu-deprecated.texi > >> > @@ -112,6 +112,14 @@ QEMU using implicit generic or board specific > >> > splitting rule. > >> > Use @option{memdev} with @var{memory-backend-ram} backend or > >> > @option{mem} (if > >> > it's supported by used machine type) to define mapping explictly > >> > instead. > >> > > >> > +@subsection -mem-path fallback to RAM (since 4.1) > >> > +Currently if system memory allocation from file pointed by > >> > @option{mem-path} > >> > +fails, QEMU fallbacks to allocating from anonymous RAM. Which might > >> > result > >> > +in unpredictable behavior since provided backing file wasn't used. > >> > >> > >> Noch such verb "to fallback", obvious fix "QEMU falls back to" > >> > >> Suggest "RAM, which might". > >> > >> Better: "since the backing file specified by the user is ignored". > >> > >> > In > >> > future > >> > +QEMU will not fallback and fail to start up, so user could fix his/her > >> > QEMU/host > >> > +configuration or explicitly use -m without -mem-path if system memo
Re: [libvirt] [Qemu-devel] [PATCH] deprecate -mem-path fallback to anonymous RAM
On Mon, 24 Jun 2019 10:36:55 +0100 Daniel P. Berrangé wrote: > On Mon, Jun 24, 2019 at 10:17:33AM +0200, Markus Armbruster wrote: > > Igor Mammedov writes: > > > > > Fallback might affect guest or worse whole host performance > > > or functionality if backing file were used to share guest RAM > > > with another process. > > > > > > Patch deprecates fallback so that we could remove it in future > > > and ensure that QEMU will provide expected behavior and fail if > > > it can't use user provided backing file. > > > > > > Signed-off-by: Igor Mammedov > > > --- > > > PS: > > > Patch is written on top of > > > [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory > > > distribution > > > to avoid conflicts in qemu-deprecated.texi > > > > > > numa.c | 4 ++-- > > > qemu-deprecated.texi | 8 > > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/numa.c b/numa.c > > > index 91a29138a2..53d67b8ad9 100644 > > > --- a/numa.c > > > +++ b/numa.c > > > @@ -494,8 +494,8 @@ static void > > > allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner, > > > if (mem_prealloc) { > > > exit(1); > > > } > > > -error_report("falling back to regular RAM allocation."); > > > - > > > +warn_report("falling back to regular RAM allocation. " > > > +"Fallback to RAM allocation is deprecated."); > > > > Can we give the user clues on how to avoid the deprecated fallback? > > There's nothing a user can do aside from ensuring they have sufficient > free memory before launching QEMU to satisfy the huge pag request. > > Probably just needs changing to do. > > "This is deprecated, future QEMU releases will exit when > huge pages cannot be allocated" Also it could be that users might use other than hugepages backing storage, that's why I completely left concrete advice out from suggestion. User should know what he/she is doing when providing mem-path, if user supplies mis-configured path QEMU will print error from memory-backend-file if/when allocation fails. > Regards, > Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH] deprecate -mem-path fallback to anonymous RAM
On Mon, 24 Jun 2019 10:17:33 +0200 Markus Armbruster wrote: > Igor Mammedov writes: > > > Fallback might affect guest or worse whole host performance > > or functionality if backing file were used to share guest RAM > > with another process. > > > > Patch deprecates fallback so that we could remove it in future > > and ensure that QEMU will provide expected behavior and fail if > > it can't use user provided backing file. > > > > Signed-off-by: Igor Mammedov > > --- > > PS: > > Patch is written on top of > > [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory > > distribution > > to avoid conflicts in qemu-deprecated.texi > > > > numa.c | 4 ++-- > > qemu-deprecated.texi | 8 > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/numa.c b/numa.c > > index 91a29138a2..53d67b8ad9 100644 > > --- a/numa.c > > +++ b/numa.c > > @@ -494,8 +494,8 @@ static void allocate_system_memory_nonnuma(MemoryRegion > > *mr, Object *owner, > > if (mem_prealloc) { > > exit(1); > > } > > -error_report("falling back to regular RAM allocation."); > > - > > +warn_report("falling back to regular RAM allocation. " > > +"Fallback to RAM allocation is deprecated."); > > Can we give the user clues on how to avoid the deprecated fallback? I've intentionally left it out for a lack of clear enough advise. Something like: "Make sure that host has resources to map file pointed by -mem-path" would be pretty useless. I think describing how host should be configured in various ways depending on type of backing storage is well out of scope of any QEMU documentation. But if you have an idea to what to put there (or what to put in deprecation doc and refer to from here), I'll add it on respin. > Warning message nitpick: the message should be a single phrase, with no > newline or trailing punctuation. Suggest something like > >warn_report("falling back to regular RAM allocation"); >error_printf("This is deprecated. "to do goes here>\n"); > > > /* Legacy behavior: if allocation failed, fall back to > > * regular RAM allocation. > > */ > > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi > > index 2fe9b72121..2193705644 100644 > > --- a/qemu-deprecated.texi > > +++ b/qemu-deprecated.texi > > @@ -112,6 +112,14 @@ QEMU using implicit generic or board specific > > splitting rule. > > Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} > > (if > > it's supported by used machine type) to define mapping explictly instead. > > > > +@subsection -mem-path fallback to RAM (since 4.1) > > +Currently if system memory allocation from file pointed by > > @option{mem-path} > > +fails, QEMU fallbacks to allocating from anonymous RAM. Which might result > > +in unpredictable behavior since provided backing file wasn't used. > > > Noch such verb "to fallback", obvious fix "QEMU falls back to" > > Suggest "RAM, which might". > > Better: "since the backing file specified by the user is ignored". > > > In > > future > > +QEMU will not fallback and fail to start up, so user could fix his/her > > QEMU/host > > +configuration or explicitly use -m without -mem-path if system memory > > allocated > > +from anonymous RAM suits usecase. > > What's "system memory allocation"? Using man page language, would be 'guest startup RAM size' acceptable? > Perhaps: "In the future, QEMU will not fall back, but fail instead. > Adjust either the host configuration [FIXME how?] or the QEMU > configuration [FIXME how?]." Maybe " In the future, QEMU will not fall back, but fail instead. Adjust either the QEMU configuration by removing @option{-mem-path} so QEMU will use only anonymous or host configuration to make sure that there are sufficient resources on backing storage pointed by -mem-path to allocate amount specified by @option{-m}. " > > + > > @section QEMU Machine Protocol (QMP) commands > > > > @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] deprecate -mem-path fallback to anonymous RAM
Fallback might affect guest or worse whole host performance or functionality if backing file were used to share guest RAM with another process. Patch deprecates fallback so that we could remove it in future and ensure that QEMU will provide expected behavior and fail if it can't use user provided backing file. Signed-off-by: Igor Mammedov --- PS: Patch is written on top of [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory distribution to avoid conflicts in qemu-deprecated.texi numa.c | 4 ++-- qemu-deprecated.texi | 8 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/numa.c b/numa.c index 91a29138a2..53d67b8ad9 100644 --- a/numa.c +++ b/numa.c @@ -494,8 +494,8 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner, if (mem_prealloc) { exit(1); } -error_report("falling back to regular RAM allocation."); - +warn_report("falling back to regular RAM allocation. " +"Fallback to RAM allocation is deprecated."); /* Legacy behavior: if allocation failed, fall back to * regular RAM allocation. */ diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 2fe9b72121..2193705644 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -112,6 +112,14 @@ QEMU using implicit generic or board specific splitting rule. Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if it's supported by used machine type) to define mapping explictly instead. +@subsection -mem-path fallback to RAM (since 4.1) +Currently if system memory allocation from file pointed by @option{mem-path} +fails, QEMU fallbacks to allocating from anonymous RAM. Which might result +in unpredictable behavior since provided backing file wasn't used. In future +QEMU will not fallback and fail to start up, so user could fix his/her QEMU/host +configuration or explicitly use -m without -mem-path if system memory allocated +from anonymous RAM suits usecase. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) -- 2.18.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory distribution
On Thu, 30 May 2019 10:33:16 +0200 Igor Mammedov wrote: > Changes since v3: > - simplify series by dropping idea of showing property values in > "qom-list-properties" > and use MachineInfo in QAPI schema instead > > Changes since v2: > - taking in account previous review, implement a way for mgmt to intospect > if > '-numa node,mem' is supported by machine type as suggested by Daniel at > https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html > * ammend "qom-list-properties" to show property values > * add "numa-mem-supported" machine property to reflect if '-numa > node,mem=SZ' > is supported. It culd be used with '-machine none' or at runtime with > --preconfig before numa memory mapping are configured > * minor fixes to deprecation documentation mentioning "numa-mem-supported" > property > > 1) "I'm considering to deprecating -mem-path/prealloc CLI options and > replacing > them with a single memdev Machine property to allow interested users to pick > used backend for initial RAM (fixes mixed -mem-path+hostmem backends issues) > and as a transition step to modeling initial RAM as a Device instead of > (ab)using MemoryRegion APIs." > (for more details see: > https://www.mail-archive.com/qemu-devel@nongnu.org/msg596314.html) > > However there is a couple of roadblocks on the way (s390x and numa memory > handling). > I think I finally thought out a way to hack s390x in migration compatible > manner, > but I don't see any way to do it for -numa node,mem and default RAM > assignement > to nodes. Considering both numa usecases aren't meaningfully using NUMA (aside > guest side testing) and could be replaced with explicitly used memdev > parameter, > I'd like to propose removing these fake NUMA friends on new machine types, > hence this deprecation. And once the last machie type that supported the > option > is removed we would be able to remove option altogether. > > As result of removing deprecated options and replacing initial RAM allocation > with 'memdev's (1), QEMU will allocate guest RAM in consistent way, fixing > mixed > use-case and allowing boards to move towards modelling initial RAM as > Device(s). > Which in its own turn should allow to cleanup NUMA/HMP/memory accounting code > more by dropping ad-hoc node_mem tracking and reusing memory device > enumeration > instead. Eduardo, could you take and merge it via numa/machine tree? > > Reference to previous versions: > * https://www.mail-archive.com/qemu-devel@nongnu.org/msg617694.html > > CC: libvir-list@redhat.com > CC: ehabk...@redhat.com > CC: pbonz...@redhat.com > CC: berra...@redhat.com > CC: arm...@redhat.com > > Igor Mammedov (3): > machine: show if CLI option '-numa node,mem' is supported in QAPI > schema > numa: deprecate 'mem' parameter of '-numa node' option > numa: deprecate implict memory distribution between nodes > > include/hw/boards.h | 3 +++ > hw/arm/virt.c| 1 + > hw/i386/pc.c | 1 + > hw/ppc/spapr.c | 1 + > numa.c | 5 + > qapi/misc.json | 5 - > qemu-deprecated.texi | 24 > vl.c | 1 + > 8 files changed, 40 insertions(+), 1 deletion(-) > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v5 1/3] machine: show if CLI option '-numa node, mem' is supported in QAPI schema
Legacy '-numa node,mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility. What's possible though is to deprecate it, keeping option working with old machine types only. In order to help users to find out if being deprecated CLI option '-numa node,mem' is still supported by particular machine type, add new "numa-mem-supported" property to output of query-machines. "numa-mem-supported" is set to 'true' for machines that currently support NUMA, but it will be flipped to 'false' later on, once deprecation period expires and kept 'true' only for old machine types that used to support the legacy option so it won't break existing configuration that are using it. Signed-off-by: Igor Mammedov --- v5: (Markus Armbruster ) * s/by machine type/by the machine type/ * ammend commit message s/to MachineInfo description in QAPI schema/to output of query-machines/ v4: * drop idea to use "qom-list-properties" and use MachineInfo instead which could be inspected with 'query-machines' include/hw/boards.h | 3 +++ hw/arm/virt.c | 1 + hw/i386/pc.c| 1 + hw/ppc/spapr.c | 1 + qapi/misc.json | 5 - vl.c| 1 + 6 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index 6ff02bf..ab6badc 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -158,6 +158,8 @@ typedef struct { * @kvm_type: *Return the type of KVM corresponding to the kvm-type string option or *computed based on other criteria such as the host kernel capabilities. + * @numa_mem_supported: + *true if '--numa node.mem' option is supported and false otherwise */ struct MachineClass { /*< private >*/ @@ -210,6 +212,7 @@ struct MachineClass { bool ignore_boot_device_suffixes; bool smbus_no_migration_support; bool nvdimm_supported; +bool numa_mem_supported; HotplugHandler *(*get_hotplug_handler)(MachineState *machine, DeviceState *dev); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index bf54f10..481a603 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler; hc->plug = virt_machine_device_plug_cb; +mc->numa_mem_supported = true; } static void virt_instance_init(Object *obj) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index edc240b..25146d7 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2750,6 +2750,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) nc->nmi_monitor_handler = x86_nmi; mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE; mc->nvdimm_supported = true; +mc->numa_mem_supported = true; object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int", pc_machine_get_device_memory_region_size, NULL, diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index e2b33e5..89d5814 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4340,6 +4340,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) * in which LMBs are represented and hot-added */ mc->numa_mem_align_shift = 28; +mc->numa_mem_supported = true; smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON; diff --git a/qapi/misc.json b/qapi/misc.json index 8b3ca4f..2dbfdf0 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -2018,12 +2018,15 @@ # # @hotpluggable-cpus: cpu hotplug via -device is supported (since 2.7.0) # +# @numa-mem-supported: true if '-numa node,mem' option is supported by +# the machine type and false otherwise (since 4.1) +# # Since: 1.2.0 ## { 'struct': 'MachineInfo', 'data': { 'name': 'str', '*alias': 'str', '*is-default': 'bool', 'cpu-max': 'int', -'hotpluggable-cpus': 'bool'} } +'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool'} } ## # @query-machines: diff --git a/vl.c b/vl.c index cd1fbc4..f5b083f 100644 --- a/vl.c +++ b/vl.c @@ -1428,6 +1428,7 @@ MachineInfoList *qmp_query_machines(Error **errp) info->name = g_strdup(mc->name); info->cpu_max = !mc->max_cpus ? 1 : mc->max_cpus; info->hotpluggable_cpus = mc->has_hotpluggable_cpus; +info->numa_mem_supported = mc->numa_mem_supported; entry = g_malloc0(sizeof(*entry)); entry->value = info; -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4 1/3] machine: show if CLI option '-numa node, mem' is supported in QAPI schema
On Fri, 07 Jun 2019 19:39:17 +0200 Markus Armbruster wrote: > Igor Mammedov writes: > > > Legacy '-numa node,mem' option has a number of issues and mgmt often > > defaults to it. Unfortunately it's no possible to replace it with > > an alternative '-numa memdev' without breaking migration compatibility. > > What's possible though is to deprecate it, keeping option working with > > old machine types only. > > > > In order to help users to find out if being deprecated CLI option > > '-numa node,mem' is still supported by particular machine type, add new > > "numa-mem-supported" property to MachineInfo description in QAPI schema. > > Suggest s/to MachineInfo description in QAPI schema/to output of > query-machines/, because query-machines is the external interface people > know. fixed > > > "numa-mem-supported" is set to 'true' for machines that currently support > > NUMA, but it will be flipped to 'false' later on, once deprecation period > > expires and kept 'true' only for old machine types that used to support > > the legacy option so it won't break existing configuration that are using > > it. > > > > Signed-off-by: Igor Mammedov > > --- > > > > Notes: > > v4: > > * drop idea to use "qom-list-properties" and use MachineInfo instead > > which could be inspected with 'query-machines' > > > > include/hw/boards.h | 3 +++ > > hw/arm/virt.c | 1 + > > hw/i386/pc.c| 1 + > > hw/ppc/spapr.c | 1 + > > qapi/misc.json | 5 - > > vl.c| 1 + > > 6 files changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h > > index 6f7916f..86894b6 100644 > > --- a/include/hw/boards.h > > +++ b/include/hw/boards.h > > @@ -158,6 +158,8 @@ typedef struct { > > * @kvm_type: > > *Return the type of KVM corresponding to the kvm-type string option or > > *computed based on other criteria such as the host kernel > > capabilities. > > + * @numa_mem_supported: > > + *true if '--numa node.mem' option is supported and false otherwise > > */ > > struct MachineClass { > > /*< private >*/ > > @@ -210,6 +212,7 @@ struct MachineClass { > > bool ignore_boot_device_suffixes; > > bool smbus_no_migration_support; > > bool nvdimm_supported; > > +bool numa_mem_supported; > > > > HotplugHandler *(*get_hotplug_handler)(MachineState *machine, > > DeviceState *dev); > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index bf54f10..481a603 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, > > void *data) > > assert(!mc->get_hotplug_handler); > > mc->get_hotplug_handler = virt_machine_get_hotplug_handler; > > hc->plug = virt_machine_device_plug_cb; > > +mc->numa_mem_supported = true; > > } > > > > static void virt_instance_init(Object *obj) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > > index 2632b73..05b8368 100644 > > --- a/hw/i386/pc.c > > +++ b/hw/i386/pc.c > > @@ -2747,6 +2747,7 @@ static void pc_machine_class_init(ObjectClass *oc, > > void *data) > > nc->nmi_monitor_handler = x86_nmi; > > mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE; > > mc->nvdimm_supported = true; > > +mc->numa_mem_supported = true; > > > > object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int", > > pc_machine_get_device_memory_region_size, NULL, > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index 2ef3ce4..265ecfb 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -4336,6 +4336,7 @@ static void spapr_machine_class_init(ObjectClass *oc, > > void *data) > > * in which LMBs are represented and hot-added > > */ > > mc->numa_mem_align_shift = 28; > > +mc->numa_mem_supported = true; > > > > smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; > > smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON; > > This is correct when the TYPE_VIRT_MACHINE, TYPE_PC_MACHINE and > TYPE_SPAPR_MACHINE are exactly the machines supporting NUMA. How could > I check that? We don't have an interface to communicate that to
Re: [libvirt] [Qemu-devel] [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory distribution
On Fri, 07 Jun 2019 19:28:58 +0200 Markus Armbruster wrote: > Igor Mammedov writes: > > > Changes since v3: > > - simplify series by dropping idea of showing property values in > > "qom-list-properties" > > and use MachineInfo in QAPI schema instead > > Where did "[PATCH v3 1/6] pc: fix possible NULL pointer dereference in > pc_machine_get_device_memory_region_size()" go? It fixes a crash bug... I'll post it as separate patch as it's not more related to this series -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v4 3/3] numa: deprecate implict memory distribution between nodes
Implicit RAM distribution between nodes has exactly the same issues as: "numa: deprecate 'mem' parameter of '-numa node' option" only with QEMU being the user that's 'adding' 'mem' parameter. Deprecate it, to get it out of the way so that we could consolidate guest RAM allocation using memory backends making it consistent and possibly later on transition to using memory devices instead of adhoc memory mapping for the initial RAM. Signed-off-by: Igor Mammedov --- numa.c | 3 +++ qemu-deprecated.texi | 8 2 files changed, 11 insertions(+) diff --git a/numa.c b/numa.c index 2205773..6d45a1f 100644 --- a/numa.c +++ b/numa.c @@ -409,6 +409,9 @@ void numa_complete_configuration(MachineState *ms) if (i == nb_numa_nodes) { assert(mc->numa_auto_assign_ram); mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size); +warn_report("Default splitting of RAM between nodes is deprecated," +" Use '-numa node,memdev' to explictly define RAM" +" allocation per node"); } numa_total = 0; diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index eb347f5..c744ba9 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -98,6 +98,14 @@ In future new machine versions will not accept the option but it will still work with old machine types. User can check QAPI schema to see if the legacy option is supported by looking at MachineInfo::numa-mem-supported property. +@subsection -numa node (without memory specified) (since 4.1) + +Splitting RAM by default between NUMA nodes has the same issues as @option{mem} +parameter described above with the difference that the role of the user plays +QEMU using implicit generic or board specific splitting rule. +Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if +it's supported by used machine type) to define mapping explictly instead. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v4 1/3] machine: show if CLI option '-numa node, mem' is supported in QAPI schema
Legacy '-numa node,mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility. What's possible though is to deprecate it, keeping option working with old machine types only. In order to help users to find out if being deprecated CLI option '-numa node,mem' is still supported by particular machine type, add new "numa-mem-supported" property to MachineInfo description in QAPI schema. "numa-mem-supported" is set to 'true' for machines that currently support NUMA, but it will be flipped to 'false' later on, once deprecation period expires and kept 'true' only for old machine types that used to support the legacy option so it won't break existing configuration that are using it. Signed-off-by: Igor Mammedov --- Notes: v4: * drop idea to use "qom-list-properties" and use MachineInfo instead which could be inspected with 'query-machines' include/hw/boards.h | 3 +++ hw/arm/virt.c | 1 + hw/i386/pc.c| 1 + hw/ppc/spapr.c | 1 + qapi/misc.json | 5 - vl.c| 1 + 6 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index 6f7916f..86894b6 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -158,6 +158,8 @@ typedef struct { * @kvm_type: *Return the type of KVM corresponding to the kvm-type string option or *computed based on other criteria such as the host kernel capabilities. + * @numa_mem_supported: + *true if '--numa node.mem' option is supported and false otherwise */ struct MachineClass { /*< private >*/ @@ -210,6 +212,7 @@ struct MachineClass { bool ignore_boot_device_suffixes; bool smbus_no_migration_support; bool nvdimm_supported; +bool numa_mem_supported; HotplugHandler *(*get_hotplug_handler)(MachineState *machine, DeviceState *dev); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index bf54f10..481a603 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler; hc->plug = virt_machine_device_plug_cb; +mc->numa_mem_supported = true; } static void virt_instance_init(Object *obj) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2632b73..05b8368 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2747,6 +2747,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) nc->nmi_monitor_handler = x86_nmi; mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE; mc->nvdimm_supported = true; +mc->numa_mem_supported = true; object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int", pc_machine_get_device_memory_region_size, NULL, diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2ef3ce4..265ecfb 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4336,6 +4336,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) * in which LMBs are represented and hot-added */ mc->numa_mem_align_shift = 28; +mc->numa_mem_supported = true; smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON; diff --git a/qapi/misc.json b/qapi/misc.json index 8b3ca4f..d0bdccb 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -2018,12 +2018,15 @@ # # @hotpluggable-cpus: cpu hotplug via -device is supported (since 2.7.0) # +# @numa-mem-supported: true if '-numa node,mem' option is supported by machine +# type and false otherwise (since 4.1) +# # Since: 1.2.0 ## { 'struct': 'MachineInfo', 'data': { 'name': 'str', '*alias': 'str', '*is-default': 'bool', 'cpu-max': 'int', -'hotpluggable-cpus': 'bool'} } +'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool'} } ## # @query-machines: diff --git a/vl.c b/vl.c index 5550bd7..5bf17f5 100644 --- a/vl.c +++ b/vl.c @@ -1520,6 +1520,7 @@ MachineInfoList *qmp_query_machines(Error **errp) info->name = g_strdup(mc->name); info->cpu_max = !mc->max_cpus ? 1 : mc->max_cpus; info->hotpluggable_cpus = mc->has_hotpluggable_cpus; +info->numa_mem_supported = mc->numa_mem_supported; entry = g_malloc0(sizeof(*entry)); entry->value = info; -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory distribution
Changes since v3: - simplify series by dropping idea of showing property values in "qom-list-properties" and use MachineInfo in QAPI schema instead Changes since v2: - taking in account previous review, implement a way for mgmt to intospect if '-numa node,mem' is supported by machine type as suggested by Daniel at https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html * ammend "qom-list-properties" to show property values * add "numa-mem-supported" machine property to reflect if '-numa node,mem=SZ' is supported. It culd be used with '-machine none' or at runtime with --preconfig before numa memory mapping are configured * minor fixes to deprecation documentation mentioning "numa-mem-supported" property 1) "I'm considering to deprecating -mem-path/prealloc CLI options and replacing them with a single memdev Machine property to allow interested users to pick used backend for initial RAM (fixes mixed -mem-path+hostmem backends issues) and as a transition step to modeling initial RAM as a Device instead of (ab)using MemoryRegion APIs." (for more details see: https://www.mail-archive.com/qemu-devel@nongnu.org/msg596314.html) However there is a couple of roadblocks on the way (s390x and numa memory handling). I think I finally thought out a way to hack s390x in migration compatible manner, but I don't see any way to do it for -numa node,mem and default RAM assignement to nodes. Considering both numa usecases aren't meaningfully using NUMA (aside guest side testing) and could be replaced with explicitly used memdev parameter, I'd like to propose removing these fake NUMA friends on new machine types, hence this deprecation. And once the last machie type that supported the option is removed we would be able to remove option altogether. As result of removing deprecated options and replacing initial RAM allocation with 'memdev's (1), QEMU will allocate guest RAM in consistent way, fixing mixed use-case and allowing boards to move towards modelling initial RAM as Device(s). Which in its own turn should allow to cleanup NUMA/HMP/memory accounting code more by dropping ad-hoc node_mem tracking and reusing memory device enumeration instead. Reference to previous versions: * https://www.mail-archive.com/qemu-devel@nongnu.org/msg617694.html CC: libvir-list@redhat.com CC: ehabk...@redhat.com CC: pbonz...@redhat.com CC: berra...@redhat.com CC: arm...@redhat.com Igor Mammedov (3): machine: show if CLI option '-numa node,mem' is supported in QAPI schema numa: deprecate 'mem' parameter of '-numa node' option numa: deprecate implict memory distribution between nodes include/hw/boards.h | 3 +++ hw/arm/virt.c| 1 + hw/i386/pc.c | 1 + hw/ppc/spapr.c | 1 + numa.c | 5 + qapi/misc.json | 5 - qemu-deprecated.texi | 24 vl.c | 1 + 8 files changed, 40 insertions(+), 1 deletion(-) -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v4 2/3] numa: deprecate 'mem' parameter of '-numa node' option
The parameter allows to configure fake NUMA topology where guest VM simulates NUMA topology but not actually getting performance benefits from it. The same or better results could be achieved using 'memdev' parameter. Beside of unpredictable performance, '-numa node.mem' option has other issues when it's used with combination of -mem-path + + -mem-prealloc + memdev backends (pc-dimm), breaking binding of memdev backends since mem-path/mem-prealloc are global and affect the most of RAM allocations. It's possible to make memdevs and global -mem-path/mem-prealloc to play nicely together but that will just complicate already complicated code and add unobious ways it could break on 2 different memmory allocation pathes and their combinations. Instead of it, consolidate all guest RAM allocation over memdev which still allows to create fake NUMA configurations if desired and leaves one simplifyed code path to consider when it comes to guest RAM allocation. To achieve desired simplification deprecate 'mem' parameter as its ad-hoc partitioning of initial RAM MemoryRegion can't be translated to memdev based backend transparently to users and in compatible manner (migration wise). Later down the road that will allow to consolidate means of how guest RAM is allocated and would permit us to clean up quite a bit memory allocations and numa code, leaving only 'memdev' implementation in place. Signed-off-by: Igor Mammedov --- Notes: v4: * fix up documentation to mention where users should look to check if -numa node.mem is supported numa.c | 2 ++ qemu-deprecated.texi | 16 2 files changed, 18 insertions(+) diff --git a/numa.c b/numa.c index 3875e1e..2205773 100644 --- a/numa.c +++ b/numa.c @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, if (node->has_mem) { numa_info[nodenr].node_mem = node->mem; +warn_report("Parameter -numa node,mem is deprecated," +" use -numa node,memdev instead"); } if (node->has_memdev) { Object *o; diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 50292d8..eb347f5 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -82,6 +82,22 @@ The @code{-realtime mlock=on|off} argument has been replaced by the The ``-virtfs_synth'' argument is now deprecated. Please use ``-fsdev synth'' and ``-device virtio-9p-...'' instead. +@subsection -numa node,mem=@var{size} (since 4.1) + +The parameter @option{mem} of @option{-numa node} is used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage specified +RAM chunk on the host side (like bind it to a host node, setting bind policy, ...), +so guest end-ups with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter @option{memdev}, which does the same as @option{mem} and adds +means to actualy manage node RAM on the host side. Use parameter @option{memdev} +with @var{memory-backend-ram} backend as an replacement for parameter @option{mem} +to achieve the same fake NUMA effect or a properly configured +@var{memory-backend-file} backend to actually benefit from NUMA configuration. +In future new machine versions will not accept the option but it will still +work with old machine types. User can check QAPI schema to see if the legacy +option is supported by looking at MachineInfo::numa-mem-supported property. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v3 1/6] pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size()
On Mon, 27 May 2019 18:36:25 +0200 Markus Armbruster wrote: > Igor Mammedov writes: > > > QEMU will crash when device-memory-region-size property is read if > > ms->device_memory > > wasn't initialized yet (ex: property being inspected during preconfig > > time). > > Reproduced: > > $ qemu-system-x86_64 -nodefaults -S -display none -preconfig -qmp stdio > {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 4}, > "package": "v4.0.0-828-ga7b21f6762"}, "capabilities": ["oob"]}} > {"execute": "qmp_capabilities"} > {"return": {}} > {"execute": "qom-get", "arguments": {"path": "/machine", "property": > "device-memory-region-size"}} > Segmentation fault (core dumped) > > First time I started looking at this series, I went "I'll need a > reproducer to fully understand what's up, and I don't feel like finding > one now; next series, please". Second time, I had to spend a few > minutes on the reproducer. Wasn't hard, since you provided a clue. > Still: make review easy, include a reproducer whenever you can. sure > > > Instead of crashing return 0 if ms->device_memory hasn't been initialized. > > > > Signed-off-by: Igor Mammedov > > --- > > hw/i386/pc.c | 6 +- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > > index d98b737..de91e90 100644 > > --- a/hw/i386/pc.c > > +++ b/hw/i386/pc.c > > @@ -2461,7 +2461,11 @@ pc_machine_get_device_memory_region_size(Object > > *obj, Visitor *v, > > Error **errp) > > { > > MachineState *ms = MACHINE(obj); > > -int64_t value = memory_region_size(&ms->device_memory->mr); > > +int64_t value = 0; > > + > > +if (ms->device_memory) { > > +memory_region_size(&ms->device_memory->mr); > > +} > > > > visit_type_int(v, name, &value, errp); > > } > > This makes qom-get return 0 for the size of memory that doesn't exist, > yet. > > A possible alternative would be setting an error. > > Opinions? We don't have a notion of property not set in QOM, so a code that will receive a text based error will have to parse it (horrible idea) to avoid generation of related ACPI parts. In case of not enabled memory hotplug, PC_MACHINE_DEVMEM_REGION_SIZE == 0 is valid value and it's what's expected by other code. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v3 4/6] numa: introduce "numa-mem-supported" machine property
On Mon, 27 May 2019 20:38:57 +0200 Markus Armbruster wrote: > Igor Mammedov writes: > > > '-numa mem' option has a number of issues and mgmt often defaults > > to it. Unfortunately it's no possible to replace it with an alternative > > '-numa memdev' without breaking migration compatibility. > > To be precise: -numa node,mem=... and -numa node,memdev=... Correct? yep, I'll try to use full syntax since so it would be clear to others. > > What's possible > > though is to deprecate it, keeping option working with old machine types. > > Once deprecation period expires, QEMU will disable '-numa mem' option, > > usage on new machine types and when the last machine type that supported > > it is removed we would be able to remove '-numa mem' with associated code. > > > > In order to help mgmt to find out if being deprecated CLI option > > '-numa mem=SZ' is still supported by particular machine type, expose > > this information via "numa-mem-supported" machine property. > > > > Users can use "qom-list-properties" QMP command to list machine type > > properties including initial proprety values (when probing for supported > > machine types with '-machine none') or at runtime at preconfig time > > before numa mapping is configured and decide if they should used legacy > > '-numa mem' or alternative '-numa memdev' option. > > This sentence is impenetrable, I'm afraid :) > > If we only want to convey whether a machine type supports -numa > node,mem=..., then adding a flag to query-machines suffices. Since I'm > pretty sure you'd have figured that out yourself, I suspect I'm missing I didn't know about query-machines, hence implemented "qom-list-properties" approach as was discussed at https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html For the purpose of deprecating '-numa node,mem" query-machines is more than enough. I'll drop 1-3 patches and respin series using query-machines. > something. Can you give me some examples of intended usage? Perhaps there will in future use cases when introspecting 'defaults' of objects will be needed, then we could look back into qom-list-properties if there aren't a better alternative. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 3/6] qmp: qmp_qom_list_properties(): ignore empty string options
Current QAPI semantics return empty "" string in case string property value hasn't been set (i.e. NULL). Do not show initial value in this case in "qom-list-properties" command output to reduce clutter. Signed-off-by: Igor Mammedov --- qmp.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/qmp.c b/qmp.c index 8415541..463c7d4 100644 --- a/qmp.c +++ b/qmp.c @@ -41,6 +41,7 @@ #include "qom/object_interfaces.h" #include "hw/mem/memory-device.h" #include "hw/acpi/acpi_dev_interface.h" +#include "qapi/qmp/qstring.h" NameInfo *qmp_query_name(Error **errp) { @@ -596,7 +597,16 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename, if (obj) { info->q_default = object_property_get_qobject(obj, info->name, NULL); -info->has_q_default = !!info->q_default; +if (info->q_default) { + if (qobject_type(info->q_default) == QTYPE_QSTRING) { + QString *value = qobject_to(QString, info->q_default); + if (!strcmp(qstring_get_str(value), "")) { + qobject_unref(info->q_default); + info->q_default = NULL; + } + } + info->has_q_default = !!info->q_default; +} } entry = g_malloc0(sizeof(*entry)); -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 4/6] numa: introduce "numa-mem-supported" machine property
'-numa mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility. What's possible though is to deprecate it, keeping option working with old machine types. Once deprecation period expires, QEMU will disable '-numa mem' option, usage on new machine types and when the last machine type that supported it is removed we would be able to remove '-numa mem' with associated code. In order to help mgmt to find out if being deprecated CLI option '-numa mem=SZ' is still supported by particular machine type, expose this information via "numa-mem-supported" machine property. Users can use "qom-list-properties" QMP command to list machine type properties including initial proprety values (when probing for supported machine types with '-machine none') or at runtime at preconfig time before numa mapping is configured and decide if they should used legacy '-numa mem' or alternative '-numa memdev' option. Signed-off-by: Igor Mammedov --- include/hw/boards.h | 1 + hw/arm/virt.c | 1 + hw/core/machine.c | 12 hw/i386/pc.c| 1 + hw/ppc/spapr.c | 1 + 5 files changed, 16 insertions(+) diff --git a/include/hw/boards.h b/include/hw/boards.h index 6f7916f..9e347cf 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -210,6 +210,7 @@ struct MachineClass { bool ignore_boot_device_suffixes; bool smbus_no_migration_support; bool nvdimm_supported; +bool numa_mem_supported; HotplugHandler *(*get_hotplug_handler)(MachineState *machine, DeviceState *dev); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 5331ab7..2e86c78 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler; hc->plug = virt_machine_device_plug_cb; +mc->numa_mem_supported = true; } static void virt_instance_init(Object *obj) diff --git a/hw/core/machine.c b/hw/core/machine.c index 5d046a4..8bc53ba 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -506,6 +506,13 @@ static char *machine_get_nvdimm_persistence(Object *obj, Error **errp) return g_strdup(ms->nvdimms_state->persistence_string); } +static bool machine_get_numa_mem_supported(Object *obj, Error **errp) +{ +MachineClass *mc = MACHINE_GET_CLASS(obj); + +return mc->numa_mem_supported; +} + static void machine_set_nvdimm_persistence(Object *obj, const char *value, Error **errp) { @@ -810,6 +817,11 @@ static void machine_class_init(ObjectClass *oc, void *data) &error_abort); object_class_property_set_description(oc, "memory-encryption", "Set memory encryption object to use", &error_abort); + +object_class_property_add_bool(oc, "numa-mem-supported", +machine_get_numa_mem_supported, NULL, &error_abort); +object_class_property_set_description(oc, "numa-mem-supported", +"Shows if legacy '-numa mem=SIZE option is supported", &error_abort); } static void machine_class_base_init(ObjectClass *oc, void *data) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index de91e90..bec0055 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2756,6 +2756,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) nc->nmi_monitor_handler = x86_nmi; mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE; mc->nvdimm_supported = true; +mc->numa_mem_supported = true; object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int", pc_machine_get_device_memory_region_size, NULL, diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2ef3ce4..265ecfb 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4336,6 +4336,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) * in which LMBs are represented and hot-added */ mc->numa_mem_align_shift = 28; +mc->numa_mem_supported = true; smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF; smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON; -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 6/6] numa: deprecate implict memory distribution between nodes
Implicit RAM distribution between nodes has exactly the same issues as: "numa: deprecate 'mem' parameter of '-numa node' option" only with QEMU being the user that's 'adding' 'mem' parameter. Deprecate it, to get it out of the way so that we could consolidate guest RAM allocation using memory backends making it consistent and possibly later on transition to using memory devices instead of adhoc memory mapping of initial RAM. --- v3: - update deprecation doc, s/4.0/4.1/ - mention that legacy 'mem' option could also be used to provide explicit memory distribution for old machine types Signed-off-by: Igor Mammedov --- numa.c | 3 +++ qemu-deprecated.texi | 8 2 files changed, 11 insertions(+) diff --git a/numa.c b/numa.c index 2205773..6d45a1f 100644 --- a/numa.c +++ b/numa.c @@ -409,6 +409,9 @@ void numa_complete_configuration(MachineState *ms) if (i == nb_numa_nodes) { assert(mc->numa_auto_assign_ram); mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size); +warn_report("Default splitting of RAM between nodes is deprecated," +" Use '-numa node,memdev' to explictly define RAM" +" allocation per node"); } numa_total = 0; diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 995a96c..546f722 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -88,6 +88,14 @@ In future new machine versions will not accept the option but it will keep working with old machine types. User can inspect read-only machine property 'numa-mem-supported' to check if specific machine type (not) supports the option. +@subsection -numa node (without memory specified) (since 4.1) + +Splitting RAM by default between NUMA nodes has the same issues as @option{mem} +parameter described above with the difference that the role of the user plays +QEMU using implicit generic or board specific splitting rule. +Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if +it's supported by used machine type) to define mapping explictly instead. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 5/6] numa: deprecate 'mem' parameter of '-numa node' option
The parameter allows to configure fake NUMA topology where guest VM simulates NUMA topology but not actually getting a performance benefits from it. The same or better results could be achieved using 'memdev' parameter. In light of that any VM that uses NUMA to get its benefits should use 'memdev'. To allow transition initial RAM to device based model, deprecate 'mem' parameter as its ad-hoc partitioning of initial RAM MemoryRegion can't be translated to memdev based backend transparently to users and in compatible manner (migration wise). That will also allow to clean up a bit our numa code, leaving only 'memdev' impl. in place and several boards that use node_mem to generate FDT/ACPI description from it. Signed-off-by: Igor Mammedov --- v3: * mention "numa-mem-supported" machine property in deprecation documentation. --- numa.c | 2 ++ qemu-deprecated.texi | 16 2 files changed, 18 insertions(+) diff --git a/numa.c b/numa.c index 3875e1e..2205773 100644 --- a/numa.c +++ b/numa.c @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, if (node->has_mem) { numa_info[nodenr].node_mem = node->mem; +warn_report("Parameter -numa node,mem is deprecated," +" use -numa node,memdev instead"); } if (node->has_memdev) { Object *o; diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 842e71b..995a96c 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -72,6 +72,22 @@ backend settings instead of environment variables. To ease migration to the new format, the ``-audiodev-help'' option can be used to convert the current values of the environment variables to ``-audiodev'' options. +@subsection -numa node,mem=@var{size} (since 4.1) + +The parameter @option{mem} of @option{-numa node} is used to assign a part of +guest RAM to a NUMA node. But when using it, it's impossible to manage specified +size on the host side (like bind it to a host node, setting bind policy, ...), +so guest end-ups with the fake NUMA configuration with suboptiomal performance. +However since 2014 there is an alternative way to assign RAM to a NUMA node +using parameter @option{memdev}, which does the same as @option{mem} and provides +means to actualy manage node RAM on the host side. Use parameter @option{memdev} +with @var{memory-backend-ram} backend as an replacement for parameter @option{mem} +to achieve the same fake NUMA effect or a properly configured +@var{memory-backend-file} backend to actually benefit from NUMA configuration. +In future new machine versions will not accept the option but it will keep +working with old machine types. User can inspect read-only machine property +'numa-mem-supported' to check if specific machine type (not) supports the option. + @section QEMU Machine Protocol (QMP) commands @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0) -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 1/6] pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size()
QEMU will crash when device-memory-region-size property is read if ms->device_memory wasn't initialized yet (ex: property being inspected during preconfig time). Instead of crashing return 0 if ms->device_memory hasn't been initialized. Signed-off-by: Igor Mammedov --- hw/i386/pc.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index d98b737..de91e90 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2461,7 +2461,11 @@ pc_machine_get_device_memory_region_size(Object *obj, Visitor *v, Error **errp) { MachineState *ms = MACHINE(obj); -int64_t value = memory_region_size(&ms->device_memory->mr); +int64_t value = 0; + +if (ms->device_memory) { +memory_region_size(&ms->device_memory->mr); +} visit_type_int(v, name, &value, errp); } -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 0/6] numa: deprecate '-numa node, mem' and default memory distribution
Changes since v2: - taking in account previous review, implement a way for mgmt to intospect if '-numa node,mem' is supported by machine type as suggested by Daniel at https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html * ammend "qom-list-properties" to show property values * add "numa-mem-supported" machine property to reflect if '-numa node,mem=SZ' is supported. It culd be used with '-machine none' or at runtime with --preconfig before numa memory mapping are configured * minor fixes to deprecation documentation mentioning "numa-mem-supported" property 1) "I'm considering to deprecating -mem-path/prealloc CLI options and replacing them with a single memdev Machine property to allow interested users to pick used backend for initial RAM (fixes mixed -mem-path+hostmem backends issues) and as a transition step to modeling initial RAM as a Device instead of (ab)using MemoryRegion APIs." (for more details see: https://www.mail-archive.com/qemu-devel@nongnu.org/msg596314.html) However there is a couple of roadblocks on the way (s390x and numa memory handling). I think I finally thought out a way to hack s390x in migration compatible manner, but I don't see any way to do it for -numa node,mem and default RAM assignement to nodes. Considering both numa usecases aren't meaningfully using NUMA (aside guest side testing) and could be replaced with explicitly used memdev parameter, I'd like to propose removing these fake NUMA friends on new machine types, hence this deprecation. And once the last machie type that supported the option is removed we would be able to remove option altogether. As result of removing deprecated options and replacing initial RAM allocation with 'memdev's (1), QEMU will allocate guest RAM in consistent way, fixing mixed use-case and allowing boards to move towards modelling initial RAM as Device(s). Which in its own turn should allow to cleanup NUMA/HMP/memory accounting code more by dropping ad-hoc node_mem tracking and reusing memory device enumeration instead. Reference to previous versions: * [PATCH 0/2] numa: deprecate -numa node, mem and default memory distribution https://www.mail-archive.com/qemu-devel@nongnu.org/msg600706.html * [PATCH] numa: warn if numa 'mem' option or default RAM splitting between nodes is used. https://www.mail-archive.com/qemu-devel@nongnu.org/msg602136.html * [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used. https://www.spinics.net/linux/fedora/libvir/msg180917.html CC: libvir-list@redhat.com CC: ehabk...@redhat.com CC: pbonz...@redhat.com CC: berra...@redhat.com CC: arm...@redhat.com Igor Mammedov (6): pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size() qmp: make "qom-list-properties" show initial property values qmp: qmp_qom_list_properties(): ignore empty string options numa: introduce "numa-mem-supported" machine property numa: deprecate 'mem' parameter of '-numa node' option numa: deprecate implict memory distribution between nodes include/hw/boards.h | 1 + hw/arm/virt.c| 1 + hw/core/machine.c| 12 hw/i386/pc.c | 7 ++- hw/ppc/spapr.c | 1 + numa.c | 5 + qapi/misc.json | 5 - qemu-deprecated.texi | 24 qmp.c| 15 +++ 9 files changed, 69 insertions(+), 2 deletions(-) -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v3 2/6] qmp: make "qom-list-properties" show initial property values
Add in the command output object's property values right after creation (i.e. state of the object returned by object_new() or equivalent). Follow up patch will add machine property 'numa-mem-supported', which would allow mgmt to introspect which machine types (versions) still support legacy "-numa mem=FOO" CLI option and which don't and require alternative '-numa memdev' option being used. Signed-off-by: Igor Mammedov --- qapi/misc.json | 5 - qmp.c | 5 + 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/qapi/misc.json b/qapi/misc.json index 8b3ca4f..e333285 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -1365,10 +1365,13 @@ # # @description: if specified, the description of the property. # +# @default: initial property value. +# # Since: 1.2 ## { 'struct': 'ObjectPropertyInfo', - 'data': { 'name': 'str', 'type': 'str', '*description': 'str' } } + 'data': { 'name': 'str', 'type': 'str', '*description': 'str', +'*default': 'any' } } ## # @qom-list: diff --git a/qmp.c b/qmp.c index b92d62c..8415541 100644 --- a/qmp.c +++ b/qmp.c @@ -593,6 +593,11 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename, info->type = g_strdup(prop->type); info->has_description = !!prop->description; info->description = g_strdup(prop->description); +if (obj) { +info->q_default = +object_property_get_qobject(obj, info->name, NULL); +info->has_q_default = !!info->q_default; +} entry = g_malloc0(sizeof(*entry)); entry->value = info; -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.
On Wed, 20 Mar 2019 15:24:42 + Daniel P. Berrangé wrote: > On Wed, Mar 20, 2019 at 04:20:19PM +0100, Igor Mammedov wrote: > > > This could be solved if QEMU has some machine type based property > > > that indicates whether "memdev" is required for a given machine, > > > but crucially *does not* actually activate that property until > > > several releases later. > > > > > > We're too late for 4.0, so lets consider QEMU 4.1 as the > > > next release of QEMU, which opens for dev in April 2019. > > > > > > QEMU 4.1 could introduce a machine type property "requires-memdev" > > > which defaults to "false" for all existing machine types. It > > > could add a deprecation that says a *future* machine type will > > > report "requires-memdev=true". IOW, "pc-i440fx-4.1" and > > > "pc-i440fx-4.2 must still report "requires-memdev=false", > > > > > > Libvirt 5.4.0 (May 2019) can now add support for "requires-memdev" > > > property. This would be effectively a no-op at time of this libvirt > > > release, since no QEMU would be reporting "requires-memdev=true" > > > for many months to come yet. > > > > > > Now, after 2 QEMU releases with the deprecation wawrning, when > > > the QEMU 5.0.0 dev cycle opens in Jan 2020, the new "pc-i440fx-5.0" > > > machine type can be made to report "requires-memdev=true". > > > > > > IOW, in April 2020 when QEMU 5.0.0 comes out, "mem" would > > > no longer be supported for new machine types. Libvirt at this > > ^^^ > > > > > time would be upto 6.4.0 but that's co-incidental since it > > > would already be doing the right thing since 5.4.0. > > > > > > IOW, this QEMU 5.0.0 would work correctly with libvirt versions > > > in the range 5.4.0 to 6.4.0 (and future). > > > > > If a user had libvirt < 5.4.0 (ie older than May 2019) nothing > > > would stop them using the "pc-i440fx-5.0" machine type, but > > > libvirt would be liable to use "mem" instead of "memdev" and > > > > > if that happened they would be unable to live migrate to a > > > host newer libvirt which honours "requires-memdev=true" > > I failed to parse this section in connection '^' underlined part, > > I'm reading 'no longer be supported' as it's not possible to start > > QEMU -M machine_foo.requires-memdev=true with 'mem' option. > > Is it what you've meant? > > I wasn't actually meaning QEMU to forbid it when i wrote this, > but on reflection, it would make sense to forbid it, as that > would avoid the user getting into a messy situation with > versions of libvirt that predate knowledge of the requires-memdev > property. Forbidding is my goal as it (at least for new machine types) - removes possibility of mis-configuration - allows in new machine to switch to frontend-backend memory model in clean way consolidating/unifying memory management (i.e. not need to map 'mem' to memdev, which from recent migration experiment appears to be impossible to do reliably) - remove someday 'mem' and all related code from QEMU once the last old machine where it was possible to use if deprecated (well, it's rather far fetched goal for that we need to come up with schedule/policyhow/when we would deprecate old machines). > > > So in summary the key to being able to tie deprecations to machine > > > type versions, is for QEMU to add a mechanism to report the desired > > > new feature usage approach against the machine type, but then ensure > > > the mechanism continues to report the old approach for 2 more releases. > > > > so that makes QEMU deprecation period effectively 3 releases (assuming > > 4 months cadence). > > There's a distinction betweenm releases and development cycles here. > The deprecation policy is defined as 2 releases, which means between > 2 and 3 development cycles depending on when in the dev cycle the > deprecation is added (start vs the end of the dev cycle) > > Regards, > Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.
On Wed, 20 Mar 2019 13:46:59 + Daniel P. Berrangé wrote: > On Wed, Mar 20, 2019 at 10:32:53AM -0300, Eduardo Habkost wrote: > > On Wed, Mar 20, 2019 at 11:51:51AM +, Daniel P. Berrangé wrote: > > > On Wed, Mar 20, 2019 at 11:26:34AM +0100, Igor Mammedov wrote: > > [...] [...] > > If a feature is deprecated, I would expect the management stack > > to stop using the deprecated feature by default as soon as > > possible, not 1 year after it was deprecated. > > True, but the challenge here is that we need to stop using the > feature in a way that isn't going to break ability to live migrate > VMs spawned by previous versions of libvirt. VM should be able to start in the first place, if we disable 'mem' on new machine, old libvirt using 'mem' won't be able to start VM with it, it won't ever come to migration point. (it's a clear signal to user about mis-configured host, at least this old/new issue shouldn't happen in downstream as it ships compatible set of packages). [...] > > Regards, > Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.
On Wed, 20 Mar 2019 11:51:51 + Daniel P. Berrangé wrote: > On Wed, Mar 20, 2019 at 11:26:34AM +0100, Igor Mammedov wrote: > > On Tue, 19 Mar 2019 14:51:07 + > > Daniel P. Berrangé wrote: > > > > > On Tue, Mar 19, 2019 at 02:08:01PM +0100, Igor Mammedov wrote: > > > > On Thu, 7 Mar 2019 10:07:05 + > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Wed, Mar 06, 2019 at 07:54:17PM +0100, Igor Mammedov wrote: > > > > > > On Wed, 6 Mar 2019 18:16:08 + > > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > > > > On Wed, Mar 06, 2019 at 06:33:25PM +0100, Igor Mammedov wrote: > > > > > > > > > > > > > > > Amend -numa option docs and print warnings if 'mem' option or > > > > > > > > default RAM > > > > > > > > splitting between nodes is used. It's intended to discourage > > > > > > > > users from using > > > > > > > > configuration that allows only to fake NUMA on guest side while > > > > > > > > leading > > > > > > > > to reduced performance of the guest due to inability to > > > > > > > > properly configure > > > > > > > > VM's RAM on the host. > > > > > > > > > > > > > > > > In NUMA case, it's recommended to always explicitly configure > > > > > > > > guest RAM > > > > > > > > using -numa node,memdev={backend-id} option. > > > > > > > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > > > --- > > > > > > > > numa.c | 5 + > > > > > > > > qemu-options.hx | 12 > > > > > > > > 2 files changed, 13 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > > > diff --git a/numa.c b/numa.c > > > > > > > > index 3875e1e..42838f9 100644 > > > > > > > > --- a/numa.c > > > > > > > > +++ b/numa.c > > > > > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState > > > > > > > > *ms, NumaNodeOptions *node, > > > > > > > > > > > > > > > > if (node->has_mem) { > > > > > > > > numa_info[nodenr].node_mem = node->mem; > > > > > > > > +warn_report("Parameter -numa node,mem is obsolete," > > > > > > > > +" use -numa node,memdev instead"); > > > > > > > > > > > > > > My comments from v1 still apply. We must not do this as long as > > > > > > > libvirt has no choice but to continue using this feature. > > > > > > It has a choice to use 'memdev' whenever creating a new VM and > > > > > > continue > > > > > > using 'mem' with exiting VMs. > > > > > > > > > > Unfortunately we don't have such a choice. Libvirt has no concept of > > > > > the > > > > > distinction between an 'existing' and 'new' VM. It just receives an > > > > > XML > > > > > file from the mgmt application and with transient guests, we have no > > > > > persistent configuration record of the VM. So we've no way of knowing > > > > > whether this VM was previously running on this same host, or another > > > > > host, or is completely new. > > > > In case of transient VM, libvirt might be able to use machine version > > > > as deciding which option to use (memdev is around more than 4 years > > > > since 2.1) > > > > (or QEMU could provide introspection into what machine version > > > > (not)supports, > > > > like it was discussed before) > > > > > > > > As discussed elsewhere (v1 tread|IRC), there are users (mainly CI) for > > > > which > > > > fake NUMA is sufficient and they do not ask for explicit pinning, so > > > > libvirt > > > > defaults to legacy -numa node,mem option. > > > > Those users do not care no aware that they should use memdev instead > > > > (I'm n
Re: [libvirt] [Qemu-devel] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.
On Tue, 19 Mar 2019 14:51:07 + Daniel P. Berrangé wrote: > On Tue, Mar 19, 2019 at 02:08:01PM +0100, Igor Mammedov wrote: > > On Thu, 7 Mar 2019 10:07:05 + > > Daniel P. Berrangé wrote: > > > > > On Wed, Mar 06, 2019 at 07:54:17PM +0100, Igor Mammedov wrote: > > > > On Wed, 6 Mar 2019 18:16:08 + > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Wed, Mar 06, 2019 at 06:33:25PM +0100, Igor Mammedov wrote: > > > > > > Amend -numa option docs and print warnings if 'mem' option or > > > > > > default RAM > > > > > > splitting between nodes is used. It's intended to discourage users > > > > > > from using > > > > > > configuration that allows only to fake NUMA on guest side while > > > > > > leading > > > > > > to reduced performance of the guest due to inability to properly > > > > > > configure > > > > > > VM's RAM on the host. > > > > > > > > > > > > In NUMA case, it's recommended to always explicitly configure guest > > > > > > RAM > > > > > > using -numa node,memdev={backend-id} option. > > > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > --- > > > > > > numa.c | 5 + > > > > > > qemu-options.hx | 12 > > > > > > 2 files changed, 13 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/numa.c b/numa.c > > > > > > index 3875e1e..42838f9 100644 > > > > > > --- a/numa.c > > > > > > +++ b/numa.c > > > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, > > > > > > NumaNodeOptions *node, > > > > > > > > > > > > if (node->has_mem) { > > > > > > numa_info[nodenr].node_mem = node->mem; > > > > > > +warn_report("Parameter -numa node,mem is obsolete," > > > > > > +" use -numa node,memdev instead"); > > > > > > > > > > My comments from v1 still apply. We must not do this as long as > > > > > libvirt has no choice but to continue using this feature. > > > > It has a choice to use 'memdev' whenever creating a new VM and continue > > > > using 'mem' with exiting VMs. > > > > > > Unfortunately we don't have such a choice. Libvirt has no concept of the > > > distinction between an 'existing' and 'new' VM. It just receives an XML > > > file from the mgmt application and with transient guests, we have no > > > persistent configuration record of the VM. So we've no way of knowing > > > whether this VM was previously running on this same host, or another > > > host, or is completely new. > > In case of transient VM, libvirt might be able to use machine version > > as deciding which option to use (memdev is around more than 4 years since > > 2.1) > > (or QEMU could provide introspection into what machine version > > (not)supports, > > like it was discussed before) > > > > As discussed elsewhere (v1 tread|IRC), there are users (mainly CI) for which > > fake NUMA is sufficient and they do not ask for explicit pinning, so libvirt > > defaults to legacy -numa node,mem option. > > Those users do not care no aware that they should use memdev instead > > (I'm not sure if they are able to ask libvirt for non pinned numa memory > > which results in memdev being used). > > This patch doesn't obsolete anything yet, it serves purpose to inform users > > that they are using legacy option and advises replacement option > > so that users would know to what they should adapt to. > > > > Once we deprecate and then remove 'mem' for new machines only (while keeping > > 'mem' working on old machine versions). The new nor old libvirt won't be > > able > > to start new machine type with 'mem' option and have to use memdev variant, > > so we don't have migration issues with new machines and old ones continue > > working with 'mem'. > > I'm not seeing what has changed which would enable us to deprecate > something only for new machines. That's not possible from libvirt's > POV as old
Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option
On Sun, 10 Mar 2019 11:14:08 +0100 Markus Armbruster wrote: > Daniel P. Berrangé writes: > > > On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote: > >> Daniel P. Berrangé writes: > >> > >> > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote: > >> >> If we deprecate outdated NUMA configurations now, we can start rejecting > >> >> them with new machine types after a suitable grace period. > >> > > >> > How is libvirt going to know what machines it can use with the feature ? > >> > We don't have any way to introspect machine type specific logic, since we > >> > run all probing with "-machine none", and QEMU can't report anything > >> > about > >> > machines without instantiating them. > >> > >> Fair point. A practical way for management applications to decide which > >> of the two interfaces they can use with which machine type may be > >> required for deprecating one of the interfaces with new machine types. > > > > We currently have "qom-list-properties" which can report on the > > existance of properties registered against object types. What it > > can't do though is report on the default values of these properties. > > Yes. > > > What's interesting though is that qmp_qom_list_properties will actually > > instantiate objects in order to query properties, if the type isn't an > > abstract type. > > If it's an abstract type, qom-list-properties returns the properties > created with object_class_property_add() & friends, typically by the > class_init method. This is possible without instantiating the type. > > If it's a concrete type, qom-list-properties additionally returns the > properties created with object_property_add(), typically by the > instance_init() method. This requires instantiating the type. > > Both kinds of properties can be added or deleted at any time. For > instance, setting a property value with object_property_set() or similar > could create additional properties. > > For historical reasons, we use often use object_property_add() where > object_class_property_add() would do. Sad. > > > IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell > > you do > > > >(QEMU) qom-list-properties typename=pc-q35-2.6-machine > > > > it will have actually instantiate the pc-q35-2.6-machine machine type. > > Since it has instantiated the machine, the object initializer function > > will have run and initialized the default values for various properties. > > > > IOW, it is possible for qom-list-properties to report on default values > > for non-abstract types. > > instance_init() also initializes the properties' values. > qom-list-properties could show these initial values (I hesitate calling > them default values). > > Setting a property's value can change other properties' values by side > effect. > > My point is: the properties qom-list-properties shows and the initial > values it could show are not necessarily final. QOM is designed to be > maximally flexible, and flexibility brings along its bosom-buddy > complexity. > > If you keep that in mind, qom-list-properties can be put to good use all > the same. > > A way to report "default values" (really: whatever the values are after > object_new()) feels like a fair feature request to me, if backed by an > actual use case. Looks like trying to migrate from 'mem' to 'memdev', just creates another train-wreck (where libvirt would have to hunt for 'right' backend configuration to make migration work and even that would be best effort attempt). If that would work reliably, I'd go for it since it allows to drop 'mem' codepath altogether but it doesn't look possible. So I'll look into adding machine level introspection and deprecating 'mem' option for new machine types. > [...] > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.
On Thu, 7 Mar 2019 10:07:05 + Daniel P. Berrangé wrote: > On Wed, Mar 06, 2019 at 07:54:17PM +0100, Igor Mammedov wrote: > > On Wed, 6 Mar 2019 18:16:08 + > > Daniel P. Berrangé wrote: > > > > > On Wed, Mar 06, 2019 at 06:33:25PM +0100, Igor Mammedov wrote: > > > > Amend -numa option docs and print warnings if 'mem' option or default > > > > RAM > > > > splitting between nodes is used. It's intended to discourage users from > > > > using > > > > configuration that allows only to fake NUMA on guest side while leading > > > > to reduced performance of the guest due to inability to properly > > > > configure > > > > VM's RAM on the host. > > > > > > > > In NUMA case, it's recommended to always explicitly configure guest RAM > > > > using -numa node,memdev={backend-id} option. > > > > > > > > Signed-off-by: Igor Mammedov > > > > --- > > > > numa.c | 5 + > > > > qemu-options.hx | 12 > > > > 2 files changed, 13 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/numa.c b/numa.c > > > > index 3875e1e..42838f9 100644 > > > > --- a/numa.c > > > > +++ b/numa.c > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, > > > > NumaNodeOptions *node, > > > > > > > > if (node->has_mem) { > > > > numa_info[nodenr].node_mem = node->mem; > > > > +warn_report("Parameter -numa node,mem is obsolete," > > > > +" use -numa node,memdev instead"); > > > > > > My comments from v1 still apply. We must not do this as long as > > > libvirt has no choice but to continue using this feature. > > It has a choice to use 'memdev' whenever creating a new VM and continue > > using 'mem' with exiting VMs. > > Unfortunately we don't have such a choice. Libvirt has no concept of the > distinction between an 'existing' and 'new' VM. It just receives an XML > file from the mgmt application and with transient guests, we have no > persistent configuration record of the VM. So we've no way of knowing > whether this VM was previously running on this same host, or another > host, or is completely new. In case of transient VM, libvirt might be able to use machine version as deciding which option to use (memdev is around more than 4 years since 2.1) (or QEMU could provide introspection into what machine version (not)supports, like it was discussed before) As discussed elsewhere (v1 tread|IRC), there are users (mainly CI) for which fake NUMA is sufficient and they do not ask for explicit pinning, so libvirt defaults to legacy -numa node,mem option. Those users do not care no aware that they should use memdev instead (I'm not sure if they are able to ask libvirt for non pinned numa memory which results in memdev being used). This patch doesn't obsolete anything yet, it serves purpose to inform users that they are using legacy option and advises replacement option so that users would know to what they should adapt to. Once we deprecate and then remove 'mem' for new machines only (while keeping 'mem' working on old machine versions). The new nor old libvirt won't be able to start new machine type with 'mem' option and have to use memdev variant, so we don't have migration issues with new machines and old ones continue working with 'mem'. That keeps QEMU's promise not to break existing configurations while let us move forward with new machines. > Regards, > Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option
On Mon, 4 Mar 2019 14:52:30 +0100 Igor Mammedov wrote: > On Fri, 1 Mar 2019 18:01:52 + > "Dr. David Alan Gilbert" wrote: > > > * Igor Mammedov (imamm...@redhat.com) wrote: > > > On Fri, 1 Mar 2019 15:49:47 + > > > Daniel P. Berrangé wrote: > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > > > > > The parameter allows to configure fake NUMA topology where guest > > > > > VM simulates NUMA topology but not actually getting a performance > > > > > benefits from it. The same or better results could be achieved > > > > > using 'memdev' parameter. In light of that any VM that uses NUMA > > > > > to get its benefits should use 'memdev' and to allow transition > > > > > initial RAM to device based model, deprecate 'mem' parameter as > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be > > > > > translated to memdev based backend transparently to users and in > > > > > compatible manner (migration wise). > > > > > > > > > > That will also allow to clean up a bit our numa code, leaving only > > > > > 'memdev' impl. in place and several boards that use node_mem > > > > > to generate FDT/ACPI description from it. > > > > > > > > Can you confirm that the 'mem' and 'memdev' parameters to -numa > > > > are 100% live migration compatible in both directions ? Libvirt > > > > would need this to be the case in order to use the 'memdev' syntax > > > > instead. > > > Unfortunately they are not migration compatible in any direction, > > > if it where possible to translate them to each other I'd alias 'mem' > > > to 'memdev' without deprecation. The former sends over only one > > > MemoryRegion to target, while the later sends over several (one per > > > memdev). > > > > > > Mixed memory issue[1] first came from libvirt side RHBZ1624223, > > > back then it was resolved on libvirt side in favor of migration > > > compatibility vs correctness (i.e. bind policy doesn't work as expected). > > > What worse that it was made default and affects all new machines, > > > as I understood it. > > > > > > In case of -mem-path + -mem-prealloc (with 1 numa node or numa less) > > > it's possible on QEMU side to make conversion to memdev in migration > > > compatible way (that's what stopped Michal from memdev approach). > > > But it's hard to do so in multi-nodes case as amount of MemoryRegions > > > is different. > > > > > > Point is to consider 'mem' as mis-configuration error, as the user > > > in the first place using broken numa configuration > > > (i.e. fake numa configuration doesn't actually improve performance). > > > > > > CCed David, maybe he could offer a way to do 1:n migration and other > > > way around. > > > > I can't see a trivial way. > > About the easiest I can think of is if you had a way to create a memdev > > that was an alias to pc.ram (of a particular size and offset). > If I get you right that's what I was planning to do for numa-less machines > that use -mem-path/prealloc options, where it's possible to replace > an initial RAM MemoryRegion with a correspondingly named memdev and its > backing MemoryRegion. > But I don't see how it could work in case of legacy NUMA 'mem' options > where initial RAM is 1 MemoryRegion (it's a fake numa after all) and how to > translate that into several MemoryRegions (one per node/memdev). Limiting it to x86 for demo purposes. What would work (if*) is to create special MemoryRegion container, i.e. 1. make memory_region_allocate_system_memory():memory_region_init() that special which already has id pc.ram and size that matches the single RAMBlock with the same id in incoming migration stream from OLD qemu ( started with -numa node,mem=x ... options) 2. register "1" with vmstate_register_ram_global()/or other API which undercover will make migration code, split the single incoming RAM block into several smaller consecutive RAMBlocks represented by memdev backends that are mapped as subregions within container 'pc.ram' 3. in case of backward migration container MemoryRegion 'pc.ram' will serve other way around stitching back memdev subregions into the single 'pc.
Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option
On Sun, 10 Mar 2019 11:16:33 +0100 Markus Armbruster wrote: > Daniel P. Berrangé writes: > > > On Wed, Mar 06, 2019 at 08:03:48PM +0100, Igor Mammedov wrote: > >> On Mon, 4 Mar 2019 16:35:16 + > >> Daniel P. Berrangé wrote: > >> > >> > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote: > >> > > We couldn't have done that. How we would migrate from older qemu? > >> > > > >> > > Anyway, now that I look into this (esp. git log) I came accross: > >> > > > >> > > commit f309db1f4d51009bad0d32e12efc75530b66836b > >> > > Author: Michal Privoznik > >> > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100 > >> > > Commit: Michal Privoznik > >> > > CommitDate: Fri Dec 19 07:44:44 2014 +0100 > >> > > > >> > > qemu: Create memory-backend-{ram,file} iff needed > >> > > > >> > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to > >> > > generated > >> > > newer cmd line but then for various reasong (e.g. avoiding triggering > >> > > a qemu > >> > > bug) we turned it off and make libvirt default to older (now > >> > > deprecated) cmd > >> > > line. > >> > > > >> > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow > >> > > migration from deprecated to new cmd line (unlikely, if not impossible, > >> > > right?) then I guess the only approach we can have is that: > >> > > > >> > > 1) whenever so called cold booting a new machine (fresh, brand new > >> > > start of > >> > > a new domain) libvirt would default to modern cmd line, > >> > > > >> > > 2) on migration, libvirt would record in the migration stream (or > >> > > status XML > >> > > or wherever) that modern cmd line was generated and thus it'll make the > >> > > destination generate modern cmd line too. > >> > > > >> > > This solution still suffers a couple of problems: > >> > > a) migration to older libvirt will fail as older libvirt won't > >> > > recognize the > >> > > flag set in 2) and therefore would default to deprecated cmd line > >> > > b) migrating from one host to another won't modernize the cmd line > >> > > > >> > > But I guess we have to draw a line somewhere (if we are not willing to > >> > > write > >> > > those migration patches). > >> > > >> > Yeah supporting backwards migration is a non-optional requirement from at > >> > least one of the mgmt apps using libvirt, so breaking the new to old case > >> > is something we always aim to avoid. > >> Aiming for support of > >> "new QEMU + new machine type" => "old QEMU + non-existing machine type" > >> seems a bit difficult. > > > > That's not the scenario that's the problem. The problem is > > > >new QEMU + new machine type + new libvirt -> new QEMU + new machine > > type + old libvirt > > > > Previously released versions of libvirt will happily use any new machine > > type that QEMU introduces. So we can't make new libvirt use a different > > options, only for new machine types, as old libvirt supports those machine > > types too. > > Avoiding tight coupling between QEMU und libvirt versions makes sense, > because having to upgrade stuff in lock-step is such a pain. > > Does not imply we must support arbitrary combinations of QEMU and > libvirt versions. Isn't it typically a job of downstream to ship a bundle that works together and it is rather limited set. e.g System 1 libvirt 0 QEMU 0 (machine 0.1 (latest)) could be migrated to 2 ways to System 2 libvirt 1 QEMU 1 (machine 0.1 (still the same old machine)) while installing QEMU 1 on System 1 might work (if it doesn't break due to dependencies) and even be able to start machine 1.0, wouldn't it really fall in unsupported category? > Unless upstream libvirt's test matrix covers all versions of libvirt > against all released versions of QEMU, "previously released versions of > libvirt will continue to work with new QEMU" is largely an empty promise > anyway. The real promise is more like "we won't break it intentionally; > good luck". > > Mind, I'm not criticizing that real promis
Re: [libvirt] [PATCH] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.
On Thu, 7 Mar 2019 10:04:56 + Daniel P. Berrangé wrote: > On Wed, Mar 06, 2019 at 07:48:22PM +0100, Igor Mammedov wrote: > > On Wed, 6 Mar 2019 17:10:37 + > > Daniel P. Berrangé wrote: > > > > > On Wed, Mar 06, 2019 at 05:58:35PM +0100, Igor Mammedov wrote: > > > > On Wed, 6 Mar 2019 16:39:38 + > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Wed, Mar 06, 2019 at 05:30:25PM +0100, Igor Mammedov wrote: > > > > > > Ammend -numa option docs and print warnings if 'mem' option or > > > > > > default RAM > > > > > > splitting between nodes is used. It's intended to discourage users > > > > > > from using > > > > > > configuration that allows only to fake NUMA on guest side while > > > > > > leading > > > > > > to reduced performance of the guest due to inability to properly > > > > > > configure > > > > > > VM's RAM on the host. > > > > > > > > > > > > In NUMA case, it's recommended to always explicitly configure guest > > > > > > RAM > > > > > > using -numa node,memdev={backend-id} option. > > > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > --- > > > > > > numa.c | 5 + > > > > > > qemu-options.hx | 12 > > > > > > 2 files changed, 13 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/numa.c b/numa.c > > > > > > index 3875e1e..c6c2a6f 100644 > > > > > > --- a/numa.c > > > > > > +++ b/numa.c > > > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, > > > > > > NumaNodeOptions *node, > > > > > > > > > > > > if (node->has_mem) { > > > > > > numa_info[nodenr].node_mem = node->mem; > > > > > > +warn_report("Parameter -numa node,mem is obsolete," > > > > > > +" use -numa node,memdev instead"); > > > > > > > > > > I don't think we should do this. Libvirt isn't going to stop using > > > > > this > > > > > option in the near term. When users see warnings like this in logs > > > > well when it was the only option available libvirt had no other choice, > > > > but since memdev became available libvirt should try to use it whenever > > > > possible. > > > > > > As we previously discussed, it is not possible for libvirt to use it > > > in all cases. > > > > > > > > > > > > they'll often file bugs reports thinking something is broken which is > > > > > not the case here. > > > > It's the exact purpose of the warning, to force user asking questions > > > > and fix configuration, since he/she obviously not getting NUMA benefits > > > > and/or performance-wise > > > > > > That's only useful if it is possible to do something about the problem. > > > Libvirt wants to use the new option but it can't due to the live migration > > > problems. So this simply leads to bug reports that will end up marked > > > as CANTFIX. > > The problem could be solved by user though, by reconfiguring and restarting > > domain since it's impossible to (at least as it stands now wrt migration). > > > > > I don't believe libvirt actually suffers from the performance problem > > > you describe wrt lack of pinning. When we attempt to pin guest NUMA > > > nodes to host NUMA nodes, libvirt *will* use "memdev". IIUC, we > > > use "mem" in the case where there /no/ requested pinning of guest > > > NUMA nodes, and so we're not suffering from the limitations of "mem" > > > in that case. > > What would be the use-case for not pinning numa nodes? > > If user isn't asking for pinning, VM would run with degraded performance and > > it would be better of being non-numa. > > The guest could have been originally booted on a host which has 2 NUMA > nodes and have been migrated to a host with 1 NUMA node, in which case > pinnning is not relevant. > > For CI purposes too it is reasonable to create guests with NUMA configurations > that bear no resemblance to the
Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option
On Mon, 4 Mar 2019 17:20:13 +0100 Michal Privoznik wrote: > On 3/4/19 3:24 PM, Daniel P. Berrangé wrote: > > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: > >> On Mon, 4 Mar 2019 12:39:08 + > >> Daniel P. Berrangé wrote: > >> > >>> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > >>>> On Mon, 04 Mar 2019 08:13:53 +0100 > >>>> Markus Armbruster wrote: > >>>> > >>>>> Daniel P. Berrangé writes: > >>>>> > >>>>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > >>>>>>> On Fri, 1 Mar 2019 15:49:47 + > >>>>>>> Daniel P. Berrangé wrote: > >>>>>>> > >>>>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote: > >>>>>>>>> The parameter allows to configure fake NUMA topology where guest > >>>>>>>>> VM simulates NUMA topology but not actually getting a performance > >>>>>>>>> benefits from it. The same or better results could be achieved > >>>>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA > >>>>>>>>> to get its benefits should use 'memdev' and to allow transition > >>>>>>>>> initial RAM to device based model, deprecate 'mem' parameter as > >>>>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be > >>>>>>>>> translated to memdev based backend transparently to users and in > >>>>>>>>> compatible manner (migration wise). > >>>>>>>>> > >>>>>>>>> That will also allow to clean up a bit our numa code, leaving only > >>>>>>>>> 'memdev' impl. in place and several boards that use node_mem > >>>>>>>>> to generate FDT/ACPI description from it. > >>>>>>>> > >>>>>>>> Can you confirm that the 'mem' and 'memdev' parameters to -numa > >>>>>>>> are 100% live migration compatible in both directions ? Libvirt > >>>>>>>> would need this to be the case in order to use the 'memdev' syntax > >>>>>>>> instead. > >>>>>>> Unfortunately they are not migration compatible in any direction, > >>>>>>> if it where possible to translate them to each other I'd alias 'mem' > >>>>>>> to 'memdev' without deprecation. The former sends over only one > >>>>>>> MemoryRegion to target, while the later sends over several (one per > >>>>>>> memdev). > >>>>>> > >>>>>> If we can't migration from one to the other, then we can not deprecate > >>>>>> the existing 'mem' syntax. Even if libvirt were to provide a config > >>>>>> option to let apps opt-in to the new syntax, we need to be able to > >>>>>> support live migration of existing running VMs indefinitely. > >>>>>> Effectively > >>>>>> this means we need the to keep 'mem' support forever, or at least such > >>>>>> a long time that it effectively means forever. > >>>>>> > >>>>>> So I think this patch has to be dropped & replaced with one that > >>>>>> simply documents that memdev syntax is preferred. > >>>>> > >>>>> We have this habit of postulating absolutes like "can not deprecate" > >>>>> instead of engaging with the tradeoffs. We need to kick it. > >>>>> > >>>>> So let's have an actual look at the tradeoffs. > >>>>> > >>>>> We don't actually "support live migration of existing running VMs > >>>>> indefinitely". > >>>>> > >>>>> We support live migration to any newer version of QEMU that still > >>>>> supports the machine type. > >>>>> > >>>>> We support live migration to any older version of QEMU that already > >>>>> supports the machine type and all the devices the machine uses. > >>>>> > >>>>> Aside: "support" is really an h
Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option
On Mon, 4 Mar 2019 16:35:16 + Daniel P. Berrangé wrote: > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote: > > On 3/4/19 3:24 PM, Daniel P. Berrangé wrote: > > > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote: > > > > On Mon, 4 Mar 2019 12:39:08 + > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote: > > > > > > On Mon, 04 Mar 2019 08:13:53 +0100 > > > > > > Markus Armbruster wrote: > > > > > > > Daniel P. Berrangé writes: > > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote: > > > > > > > > > On Fri, 1 Mar 2019 15:49:47 + > > > > > > > > > Daniel P. Berrangé wrote: > > > > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov > > > > > > > > > > wrote: > > > > > > > > > > > The parameter allows to configure fake NUMA topology > > > > > > > > > > > where guest > > > > > > > > > > > VM simulates NUMA topology but not actually getting a > > > > > > > > > > > performance > > > > > > > > > > > benefits from it. The same or better results could be > > > > > > > > > > > achieved > > > > > > > > > > > using 'memdev' parameter. In light of that any VM that > > > > > > > > > > > uses NUMA > > > > > > > > > > > to get its benefits should use 'memdev' and to allow > > > > > > > > > > > transition > > > > > > > > > > > initial RAM to device based model, deprecate 'mem' > > > > > > > > > > > parameter as > > > > > > > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't > > > > > > > > > > > be > > > > > > > > > > > translated to memdev based backend transparently to users > > > > > > > > > > > and in > > > > > > > > > > > compatible manner (migration wise). > > > > > > > > > > > > > > > > > > > > > > That will also allow to clean up a bit our numa code, > > > > > > > > > > > leaving only > > > > > > > > > > > 'memdev' impl. in place and several boards that use > > > > > > > > > > > node_mem > > > > > > > > > > > to generate FDT/ACPI description from it. > > > > > > > > > > > > > > > > > > > > Can you confirm that the 'mem' and 'memdev' parameters to > > > > > > > > > > -numa > > > > > > > > > > are 100% live migration compatible in both directions ? > > > > > > > > > > Libvirt > > > > > > > > > > would need this to be the case in order to use the 'memdev' > > > > > > > > > > syntax > > > > > > > > > > instead. > > > > > > > > > Unfortunately they are not migration compatible in any > > > > > > > > > direction, > > > > > > > > > if it where possible to translate them to each other I'd > > > > > > > > > alias 'mem' > > > > > > > > > to 'memdev' without deprecation. The former sends over only > > > > > > > > > one > > > > > > > > > MemoryRegion to target, while the later sends over several > > > > > > > > > (one per > > > > > > > > > memdev). > > > > > > > > > > > > > > > > If we can't migration from one to the other, then we can not > > > > > > > > deprecate > > > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a > > > > > > > > config > > > > > > > > option to let apps opt-in to the new syntax, we need to be able > > > > > > > > to > > >
Re: [libvirt] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.
On Wed, 6 Mar 2019 18:16:08 + Daniel P. Berrangé wrote: > On Wed, Mar 06, 2019 at 06:33:25PM +0100, Igor Mammedov wrote: > > Amend -numa option docs and print warnings if 'mem' option or default RAM > > splitting between nodes is used. It's intended to discourage users from > > using > > configuration that allows only to fake NUMA on guest side while leading > > to reduced performance of the guest due to inability to properly configure > > VM's RAM on the host. > > > > In NUMA case, it's recommended to always explicitly configure guest RAM > > using -numa node,memdev={backend-id} option. > > > > Signed-off-by: Igor Mammedov > > --- > > numa.c | 5 + > > qemu-options.hx | 12 > > 2 files changed, 13 insertions(+), 4 deletions(-) > > > > diff --git a/numa.c b/numa.c > > index 3875e1e..42838f9 100644 > > --- a/numa.c > > +++ b/numa.c > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, > > NumaNodeOptions *node, > > > > if (node->has_mem) { > > numa_info[nodenr].node_mem = node->mem; > > +warn_report("Parameter -numa node,mem is obsolete," > > +" use -numa node,memdev instead"); > > My comments from v1 still apply. We must not do this as long as > libvirt has no choice but to continue using this feature. It has a choice to use 'memdev' whenever creating a new VM and continue using 'mem' with exiting VMs. > > > } > > if (node->has_memdev) { > > Object *o; > > @@ -407,6 +409,9 @@ void numa_complete_configuration(MachineState *ms) > > if (i == nb_numa_nodes) { > > assert(mc->numa_auto_assign_ram); > > mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, > > ram_size); > > +warn_report("Default splitting of RAM between nodes is > > obsolete," > > +" Use '-numa node,memdev' to explicitly define RAM" > > +" allocation per node"); > > } > > > > numa_total = 0; > > diff --git a/qemu-options.hx b/qemu-options.hx > > index 1cf9aac..61035cb 100644 > > --- a/qemu-options.hx > > +++ b/qemu-options.hx > > @@ -206,10 +206,14 @@ For example: > > -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1 > > @end example > > > > -@samp{mem} assigns a given RAM amount to a node. @samp{memdev} > > -assigns RAM from a given memory backend device to a node. If > > -@samp{mem} and @samp{memdev} are omitted in all nodes, RAM is > > -split equally between them. > > +@samp{memdev} assigns RAM from a given memory backend device to a node. > > + > > +Legacy options/behaviour: @samp{mem} assigns a given RAM amount to a node. > > +If @samp{mem} and @samp{memdev} are omitted in all nodes, RAM is split > > equally > > +between them. Option @samp{mem} and default RAM splitting are obsolete as > > they > > +do not provide means to manage RAM on the host side and only allow QEMU to > > fake > > +NUMA support which in practice could degrade VM performance. > > +It's advised to always explicitly configure NUMA RAM by using the > > @samp{memdev} option. > > > > @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore, > > if one node uses @samp{memdev}, all of them have to use it. > > -- > > 2.7.4 > > > > Regards, > Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list