from:"Igor Mammedov"

Re: [PATCH v2] Deprecate the "-no-acpi" command line switch

2023-02-24 Thread Igor Mammedov

On Fri, 24 Feb 2023 10:05:43 +0100
Thomas Huth  wrote:

> Similar to "-no-hpet", the "-no-acpi" switch is a legacy command
> line option that should be replaced with the "acpi" machine parameter
> nowadays.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Igor Mammedov 

> ---
>  v2: Fixed stypid copy-n-paste bug (Thanks to Sunil for spotting it!)
> 
>  docs/about/deprecated.rst | 6 ++
>  softmmu/vl.c  | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index ee95bcb1a6..15084f7bea 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -99,6 +99,12 @@ form is preferred.
>  The HPET setting has been turned into a machine property.
>  Use ``-machine hpet=off`` instead.
>  
> +``-no-acpi`` (since 8.0)
> +''''''''''''''''''''''''
> +
> +The ``-no-acpi`` setting has been turned into a machine property.
> +Use ``-machine acpi=off`` instead.
> +
>  ``-accel hax`` (since 8.0)
>  ''''''''''''''''''''''''''
>  
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 459588aa7d..a3c59b5462 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -3271,6 +3271,7 @@ void qemu_init(int argc, char **argv)
>  vnc_parse(optarg);
>  break;
>  case QEMU_OPTION_no_acpi:
> +warn_report("-no-acpi is deprecated, use '-machine acpi=off' 
> instead");
>  qdict_put_str(machine_opts_dict, "acpi", "off");
>  break;
>  case QEMU_OPTION_no_hpet:

Re: [PATCH] tests: qemucapabilities: Update qemu caps dump for the qemu-7.0.0 release on x86_64

2022-04-20 Thread Igor Mammedov

On Wed, 20 Apr 2022 14:13:00 +0200
Peter Krempa  wrote:

> On Wed, Apr 20, 2022 at 14:00:52 +0200, Igor Mammedov wrote:
> > On Wed, 20 Apr 2022 12:21:03 +0100
> > Daniel P. Berrangé  wrote:
> >   
> > > On Wed, Apr 20, 2022 at 01:15:43PM +0200, Igor Mammedov wrote:  
> > > > On Wed, 20 Apr 2022 13:02:12 +0200
> > > > Peter Krempa  wrote:
> > > > 
> > > > > Few minor changes in qemu since the last update:
> > > > > - PIIX4_PM gained 'x-not-migrate-acpi-index' property
> > > > 
> > > > do you do this for just for every new property?
> > > > (nothing outside of QEMU needs to know about x-not-migrate-acpi-index,
> > > > unless one is interested in whether it works or not)
> > > 
> > > This is simply a record of what QEMU reports when you query properties
> > > for the devices libvirt cares about.  
> > 
> > I was just curious why libvirt does it.
> >   
> > > If nothing outside is supposed to
> > > know about x-not-migrate-acpi-index then QEMU shouldn't tell us about
> > > it when asked for properties :-)  
> > 
> > Does livirt uses/exposes x- prefixed property anywhere?
> > (i.e. can QEMU hide them?)  
> 
> I don't think it's needed to hide them. In fact we have strong rules
> against using them.
> 
> With one notable exception:
> 
> -object memory-backend-file,x-use-canonical-path-for-ramblock-id=
> 
> But this was discussed extensively on the qemu list and qemu pledges
> that this specific property is considered stable.

ok lets leave it as is.

Re: [PATCH] tests: qemucapabilities: Update qemu caps dump for the qemu-7.0.0 release on x86_64

2022-04-20 Thread Igor Mammedov

On Wed, 20 Apr 2022 12:21:03 +0100
Daniel P. Berrangé  wrote:

> On Wed, Apr 20, 2022 at 01:15:43PM +0200, Igor Mammedov wrote:
> > On Wed, 20 Apr 2022 13:02:12 +0200
> > Peter Krempa  wrote:
> >   
> > > Few minor changes in qemu since the last update:
> > > - PIIX4_PM gained 'x-not-migrate-acpi-index' property  
> > 
> > do you do this for just for every new property?
> > (nothing outside of QEMU needs to know about x-not-migrate-acpi-index,
> > unless one is interested in whether it works or not)  
> 
> This is simply a record of what QEMU reports when you query properties
> for the devices libvirt cares about.

I was just curious why libvirt does it.

> If nothing outside is supposed to
> know about x-not-migrate-acpi-index then QEMU shouldn't tell us about
> it when asked for properties :-)

Does livirt uses/exposes x- prefixed property anywhere?
(i.e. can QEMU hide them?)
 
> With regards,
> Daniel

Re: [PATCH] tests: qemucapabilities: Update qemu caps dump for the qemu-7.0.0 release on x86_64

2022-04-20 Thread Igor Mammedov

On Wed, 20 Apr 2022 13:02:12 +0200
Peter Krempa  wrote:

> Few minor changes in qemu since the last update:
> - PIIX4_PM gained 'x-not-migrate-acpi-index' property

do you do this for just for every new property?
(nothing outside of QEMU needs to know about x-not-migrate-acpi-index,
unless one is interested in whether it works or not)

> - 'cocoa' display and corresponding props (not present in this build)
> 
> Changes in build:
> - dbus display driver re-enabled
> - gtk display support re-disabled
> - xen support re-disabled
> 
> Signed-off-by: Peter Krempa 
> ---
>  .../caps_7.0.0.x86_64.replies | 583 --
>  .../caps_7.0.0.x86_64.xml |  10 +-
>  2 files changed, 257 insertions(+), 336 deletions(-)
> 
> diff --git a/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.replies 
> b/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.replies
> index d1f453dcca..620442704a 100644
> --- a/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.replies
> +++ b/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.replies
> @@ -17,11 +17,11 @@
>  {
>"return": {
>  "qemu": {
> -  "micro": 92,
> -  "minor": 2,
> -  "major": 6
> +  "micro": 0,
> +  "minor": 0,
> +  "major": 7
>  },
> -"package": "v7.0.0-rc2"
> +"package": "v7.0.0"
>},
>"id": "libvirt-2"
>  }
> @@ -5119,10 +5119,6 @@
>"name": "135",
>"tag": "type",
>"variants": [
> -{
> -  "case": "gtk",
> -  "type": "358"
> -},
>  {
>"case": "curses",
>"type": "360"
> @@ -5131,6 +5127,10 @@
>"case": "egl-headless",
>"type": "361"
>  },
> +{
> +  "case": "dbus",
> +  "type": "362"
> +},
>  {
>"case": "default",
>"type": "0"
> @@ -10498,6 +10498,10 @@
>"case": "qemu-vdagent",
>"type": "518"
>  },
> +{
> +  "case": "dbus",
> +  "type": "519"
> +},
>  {
>"case": "vc",
>"type": "520"
> @@ -11756,9 +11760,6 @@
>  {
>"name": "none"
>  },
> -{
> -  "name": "gtk"
> -},
>  {
>"name": "sdl"
>  },
> @@ -11770,17 +11771,20 @@
>  },
>  {
>"name": "spice-app"
> +},
> +{
> +  "name": "dbus"
>  }
>],
>"meta-type": "enum",
>"values": [
>  "default",
>  "none",
> -"gtk",
>  "sdl",
>  "egl-headless",
>  "curses",
> -"spice-app"
> +"spice-app",
> +"dbus"
>]
>  },
>  {
> @@ -16067,6 +16071,9 @@
>  {
>"name": "qemu-vdagent"
>  },
> +{
> +  "name": "dbus"
> +},
>  {
>"name": "vc"
>  },
> @@ -16097,6 +16104,7 @@
>  "spicevmc",
>  "spiceport",
>  "qemu-vdagent",
> +"dbus",
>  "vc",
>  "ringbuf",
>  "memory"
> @@ -16202,6 +16210,16 @@
>],
>"meta-type": "object"
>  },
> +{
> +  "name": "519",
> +  "members": [
> +{
> +  "name": "data",
> +  "type": "618"
> +}
> +  ],
> +  "meta-type": "object"
> +},
>  {
>"name": "520",
>"members": [
> @@ -18460,6 +18478,26 @@
>],
>"meta-type": "object"
>  },
> +{
> +  "name": "618",
> +  "members": [
> +{
> +  "name": "logfile",
> +  "default": null,
> +  "type": "str"
> +},
> +{
> +  "name": "logappend",
> +  "default": null,
> +  "type": "bool"
> +},
> +{
> +  "name": "name",
> +  "type": "str"
> +}
> +  ],
> +  "meta-type": "object"
> +},
>  {
>"name": "619",
>"members": [
> @@ -20363,10 +20401,6 @@
>"name": "acpi-erst",
>"parent": "pci-device"
>  },
> -{
> -  "name": "virtio-crypto-device",
> -  "parent": "virtio-device"
> -},
>  {
>"name": "isa-applesmc",
>"parent": "isa-device"
> @@ -20379,49 +20413,53 @@
>"name": "vhost-user-input-pci",
>"parent": "vhost-user-input-pci-base-type"
>  },
> +{
> +  "name": "usb-redir",
> +  "parent": "usb-device"
> +},
>  {
>"name": "floppy-bus",
>"parent": "bus"
>  },
>  {
> -  "name": "Denverton-x86_64-cpu",
> -  "parent": "x86_64-cpu"
> +  "name": "virtio-crypto-device",
> +  "parent": "virtio-device"
>  },
>  {
>"name": "chardev-testdev",
>"parent": "chardev"
>  },
>  {
> -  "name": "usb-wacom-tablet",
> -  "parent": "usb-device"
> +  "name": "Denverton-x86_64-cpu",
> +  "parent": "x86_64-cpu"
>  },
>  {
> -  "name"

Re: [PATCH v4 2/4] hw/i386: Attach CPUs to machine

2022-02-08 Thread Igor Mammedov

On Mon, 7 Feb 2022 11:48:27 +
Daniel P. Berrangé  wrote:

> On Mon, Feb 07, 2022 at 12:22:22PM +0100, Igor Mammedov wrote:
> > On Mon, 7 Feb 2022 10:36:42 +0100
> > Peter Krempa  wrote:
> >   
> > > On Mon, Feb 07, 2022 at 10:18:43 +0100, Igor Mammedov wrote:  
> > > > On Mon, 7 Feb 2022 09:14:37 +0100
> > > > Igor Mammedov  wrote:
> > > > 
> > > > > On Sat,  5 Feb 2022 13:45:24 +0100
> > > > > Philippe Mathieu-Daudé  wrote:
> > > > > 
> > > > > > Previously CPUs were exposed in the QOM tree at a path
> > > > > > 
> > > > > >   /machine/unattached/device[nn]
> > > > > > 
> > > > > > where the 'nn' of the first CPU is usually zero, but can
> > > > > > vary depending on what devices were already created.
> > > > > > 
> > > > > > With this change the CPUs are now at
> > > > > > 
> > > > > >   /machine/cpu[nn]
> > > > > > 
> > > > > > where the 'nn' of the first CPU is always zero.  
> > > > > 
> > > > > Could you add to commit message the reason behind the change?
> > > > 
> > > > regardless, it looks like unwarranted movement to me
> > > > prompted by livirt accessing/expecting a QOM patch which is
> > > > not stable ABI. I'd rather get it fixed on libvirt side.
> > > > 
> > > > If libvirt needs for some reason access a CPU instance,
> > > > it should use @query-hotpluggable-cpus to get a list of CPUs
> > > > (which includes QOM path of already present CPUs) instead of
> > > > hard-codding some 'well-known' path as there is no any guarantee 
> > > > that it will stay stable whatsoever.
> > > 
> > > I don't disagree with you about the use of hardcoded path, but the way
> > > of using @query-hotpluggable-cpus is not really aligning well for how
> > > it's being used.
> > >
> > > To shed a bit more light, libvirt uses the following hardcoded path
> > > 
> > > #define QOM_CPU_PATH  "/machine/unattached/device[0]"
> > > 
> > > in code which is used to query CPU flags. That code doesn't care at all
> > > which cpus are present but wants to get any of them. So yes, calling
> > > query-hotpluggable-cpus is possible but a bit pointless.  
> > 
> > Even though query-hotpluggable-cpus is cumbersome
> > it still lets you avoid hardcodding QOM path and let you
> > get away with keeping "_400 QMP calls" probing while
> > something better comes along.
> > 
> >   
> > > In general the code probing cpu flags via qom-get is very cumbersome as
> > > it ends up doing ~400 QMP calls at startup of a VM in cases when we deem
> > > it necessary to probe the cpu fully.
> > > 
> > > It would be much better (And would sidestep the issue altoghether) if we
> > > had a more sane interface to probe all cpu flags in one go, and ideally
> > > the argument specifying the cpu being optional.
> > > 
> > > Libvirt can do the adjustment, but for now IMO the path to the first cpu
> > > (/machine/unattached/device[0]) became de-facto ABI by the virtue that
> > > it was used by libvirt and if I remember correctly it was suggested by
> > > the folks dealing with the CPU when the code was added originally.  
> > I would've argued against that back then as well,
> > there weren't any guarantee and I wouldn't like precedent of
> > QOM abuse becoming de-facto ABI.
> > Note: this patch breaks this so called ABI as well and introduces
> > yet another harcodded path without any stability guarantee whatsoever.  
> 
> AFAIK, we've never defined anything about QOM paths wrt ABI one way
> or the other ? In the absence of guidelines then it comes down to

not written in docs anyways (all I have is vague recollection that
we really didn't want to make of QOM path/tree an ABI).
For more on this topic see the comment at the end.

> what are reasonable expectations of the mgmt app. These expectations
> will be influenced by what it is actually possible to acheive given
> our API as exposed.
> 
> I think it is unreasonable to expect /machine/unattached to be
> stable because by its very nature it is just a dumping ground
> for anything where the dev hasn't put in any thought to the path
> placement.  IOW, it was/is definitely a bad idea for libvirt to
> r

Re: [PATCH v4 2/4] hw/i386: Attach CPUs to machine

2022-02-08 Thread Igor Mammedov

On Mon, 7 Feb 2022 10:36:42 +0100
Peter Krempa  wrote:

> On Mon, Feb 07, 2022 at 10:18:43 +0100, Igor Mammedov wrote:
> > On Mon, 7 Feb 2022 09:14:37 +0100
> > Igor Mammedov  wrote:
> >   
> > > On Sat,  5 Feb 2022 13:45:24 +0100
> > > Philippe Mathieu-Daudé  wrote:
> > >   
> > > > Previously CPUs were exposed in the QOM tree at a path
> > > > 
> > > >   /machine/unattached/device[nn]
> > > > 
> > > > where the 'nn' of the first CPU is usually zero, but can
> > > > vary depending on what devices were already created.
> > > > 
> > > > With this change the CPUs are now at
> > > > 
> > > >   /machine/cpu[nn]
> > > > 
> > > > where the 'nn' of the first CPU is always zero.
> > > 
> > > Could you add to commit message the reason behind the change?  
> > 
> > regardless, it looks like unwarranted movement to me
> > prompted by livirt accessing/expecting a QOM patch which is
> > not stable ABI. I'd rather get it fixed on libvirt side.
> > 
> > If libvirt needs for some reason access a CPU instance,
> > it should use @query-hotpluggable-cpus to get a list of CPUs
> > (which includes QOM path of already present CPUs) instead of
> > hard-codding some 'well-known' path as there is no any guarantee 
> > that it will stay stable whatsoever.  
> 
> I don't disagree with you about the use of hardcoded path, but the way
> of using @query-hotpluggable-cpus is not really aligning well for how
> it's being used.
>
> To shed a bit more light, libvirt uses the following hardcoded path
> 
> #define QOM_CPU_PATH  "/machine/unattached/device[0]"
> 
> in code which is used to query CPU flags. That code doesn't care at all
> which cpus are present but wants to get any of them. So yes, calling
> query-hotpluggable-cpus is possible but a bit pointless.

Even though query-hotpluggable-cpus is cumbersome
it still lets you avoid hardcodding QOM path and let you
get away with keeping "_400 QMP calls" probing while
something better comes along.


> In general the code probing cpu flags via qom-get is very cumbersome as
> it ends up doing ~400 QMP calls at startup of a VM in cases when we deem
> it necessary to probe the cpu fully.
> 
> It would be much better (And would sidestep the issue altoghether) if we
> had a more sane interface to probe all cpu flags in one go, and ideally
> the argument specifying the cpu being optional.
> 
> Libvirt can do the adjustment, but for now IMO the path to the first cpu
> (/machine/unattached/device[0]) became de-facto ABI by the virtue that
> it was used by libvirt and if I remember correctly it was suggested by
> the folks dealing with the CPU when the code was added originally.
I would've argued against that back then as well,
there weren't any guarantee and I wouldn't like precedent of
QOM abuse becoming de-facto ABI.
Note: this patch breaks this so called ABI as well and introduces
yet another harcodded path without any stability guarantee whatsoever.

> 
> Even if we change it in libvirt right away, changing qemu will break
> forward compatibility. While we don't guarantee it, it still creates
> user grief.
>

Re: [PATCH v4 2/4] hw/i386: Attach CPUs to machine

2022-02-07 Thread Igor Mammedov

On Mon, 7 Feb 2022 09:14:37 +0100
Igor Mammedov  wrote:

> On Sat,  5 Feb 2022 13:45:24 +0100
> Philippe Mathieu-Daudé  wrote:
> 
> > Previously CPUs were exposed in the QOM tree at a path
> > 
> >   /machine/unattached/device[nn]
> > 
> > where the 'nn' of the first CPU is usually zero, but can
> > vary depending on what devices were already created.
> > 
> > With this change the CPUs are now at
> > 
> >   /machine/cpu[nn]
> > 
> > where the 'nn' of the first CPU is always zero.  
> 
> Could you add to commit message the reason behind the change?

regardless, it looks like unwarranted movement to me
prompted by livirt accessing/expecting a QOM patch which is
not stable ABI. I'd rather get it fixed on libvirt side.

If libvirt needs for some reason access a CPU instance,
it should use @query-hotpluggable-cpus to get a list of CPUs
(which includes QOM path of already present CPUs) instead of
hard-codding some 'well-known' path as there is no any guarantee 
that it will stay stable whatsoever.
 
> > Note: This (intentionally) breaks compatibility with current
> > libvirt code that looks for "/machine/unattached/device[0]"
> > in the assumption it is the first CPU.  
> Why libvirt does this in the first place?
> 
>  
> > Cc: libvir-list@redhat.com
> > Suggested-by: Daniel P. Berrangé 
> > Reviewed-by: Daniel P. Berrangé 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >  hw/i386/x86.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index b84840a1bb9..50bf249c700 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
> > @@ -108,6 +108,7 @@ void x86_cpu_new(X86MachineState *x86ms, int64_t 
> > apic_id, Error **errp)
> >  {
> >  Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
> >  
> > +object_property_add_child(OBJECT(x86ms), "cpu[*]", OBJECT(cpu));  
> 
> that will take in account only initial cpus, -device/device_add cpus
> will still go to wherever device_add attaches them (see qdev_set_id)
> 
> >  if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
> >  goto out;
> >  }  
>

Re: [PATCH v4 2/4] hw/i386: Attach CPUs to machine

2022-02-07 Thread Igor Mammedov

On Sat,  5 Feb 2022 13:45:24 +0100
Philippe Mathieu-Daudé  wrote:

> Previously CPUs were exposed in the QOM tree at a path
> 
>   /machine/unattached/device[nn]
> 
> where the 'nn' of the first CPU is usually zero, but can
> vary depending on what devices were already created.
> 
> With this change the CPUs are now at
> 
>   /machine/cpu[nn]
> 
> where the 'nn' of the first CPU is always zero.

Could you add to commit message the reason behind the change?


> Note: This (intentionally) breaks compatibility with current
> libvirt code that looks for "/machine/unattached/device[0]"
> in the assumption it is the first CPU.
Why libvirt does this in the first place?

 
> Cc: libvir-list@redhat.com
> Suggested-by: Daniel P. Berrangé 
> Reviewed-by: Daniel P. Berrangé 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/i386/x86.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index b84840a1bb9..50bf249c700 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -108,6 +108,7 @@ void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, 
> Error **errp)
>  {
>  Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
>  
> +object_property_add_child(OBJECT(x86ms), "cpu[*]", OBJECT(cpu));

that will take in account only initial cpus, -device/device_add cpus
will still go to wherever device_add attaches them (see qdev_set_id)

>  if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
>  goto out;
>  }

Re: [PATCH 5/5] qemu: Prefer -numa cpu over -numa node,cpus=

2021-10-21 Thread Igor Mammedov

On Wed, 20 Oct 2021 16:15:29 +0200
Michal Prívozník  wrote:

> On 10/20/21 1:18 PM, Peter Krempa wrote:
> > On Wed, Oct 20, 2021 at 13:07:59 +0200, Michal Prívozník wrote:  
> >> On 10/6/21 3:32 PM, Igor Mammedov wrote:  
> >>> On Thu, 30 Sep 2021 14:08:34 +0200
> >>> Peter Krempa  wrote:  
> > 
> > [...]
> >   
> >> 2) In my experiments I try to mimic what libvirt does. Here's my cmd
> >> line:
> >>
> >> qemu-system-x86_64 \
> >> -S \
> >> -preconfig \
> >> -cpu host \
> >> -smp 120,sockets=2,dies=3,cores=4,threads=5 \
> >> -object 
> >> '{"qom-type":"memory-backend-memfd","id":"ram-node0","size":4294967296,"host-nodes":[0],"policy":"bind"}'
> >>  \
> >> -numa node,nodeid=0,memdev=ram-node0 \
> >> -no-user-config \
> >> -nodefaults \
> >> -no-shutdown \
> >> -qmp stdio
> >>
> >> and here is my QMP log:
> >>
> >> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 6}, 
> >> "package": "v6.1.0-1552-g362534a643"}, "capabilities": ["oob"]}}
> >>
> >> {"execute":"qmp_capabilities"}
> >> {"return": {}}
> >>
> >> {"execute":"query-hotpluggable-cpus"}
> >> {"return": [{"props": {"core-id": 3, "thread-id": 4, "die-id": 2, 
> >> "socket-id": 1}, "vcpus-count": 1, "type": "host-x86_64-cpu"}, {"props": 
> >> {"core-id": 3, "thread-id": 3, "die-id": 2, "socket-id": 1}, 
> >> "vcpus-count": 1, "type": "host-x86_64-cpu"}, {"props": {"core-id": 3, 
> >> "thread-id": 2, "die-id": 2, "socket-id": 1}, "vcpus-count": 1, "type": 
> >> "host-x86_64-cpu"}, {"props": {"core-id": 3, "thread-id": 1, "die-id": 2, 
> >> "socket-id": 1}, "vcpus-count": 1, "type": "host-x86_64-cpu"}, {"props": 
> >> {"core-id": 3, "thread-id": 0, "die-id": 2, "socket-id": 1}, 
> >> "vcpus-count": 1, "type": "host-x86_64-cpu"}, {"props": {"core-id": 2, 
> >> "thread-id": 4, "die-id": 2, "socket-id": 1}, "vcpus-count": 1, "type": 
> >> "host-x86_64-cpu"},
> >> 
> >> {"props": {"core-id": 0, "thread-id": 0, "die-id": 0, "socket-id": 0}, 
> >> "vcpus-count": 1, "type": "host-x86_64-cpu"}]}
> >>
> >>
> >> I can see that query-hotpluggable-cpus returns an array. Can I safely
> >> assume that vCPU ID == index in the array? I mean, if I did have -numa  
> > 
> > No, this assumption would be incorrect on the aforementioned PPC
> > platform where one entry in the returned array can describe multiple
> > cores.
> > 
> > qemuDomainFilterHotplugVcpuEntities is the code that cross-references
> > the libvirt "index" with the data returned by query-hotpluggable cpus.
> > 
> > The important bit is the 'vcpus-count' property. The code which deals
> > with hotplug is already fetching everything that's needed.  
> 
> Ah, I see. So my assumption would be correct if vcpus-count would be 1
> for all entries. If it isn't then I need to account for how much
only for some boards.
An entry in array describes an single entity that should be handled
as a single device by user (-device/plug/unplug/other mapping options)
(and the entity might have 1 or more vCPUs (threads) depending on
target arch/board).

> vcpus-count is in each entity. Fair enough. But
> qemuDomainFilterHotplugVcpuEntities() doesn't really do vCPU ID ->
> [socket, core, thread] translation, does it?
> 
> 
> But even if it did, I am still wondering what the purpose of this whole
> exercise is. QEMU won't be able to drop ID -> [socket, core, thread]
> mapping. The only thing it would be able to drop is a few lines of code
> handling command line. Am I missing something obvious?
I described in other email why QEMU is dropping cpu_idex on external
interfaces (it's possible to drop it internally too, but I don't see
much gain there vs effort such refactoring would require).

Sure thing, you can invent/maintain libvirt internal
 "vCPU ID" -> [topo props]
mapping if it's necessary.

However using just a "vCPU ID" will obscure topology information
from upper layers.
Maybe providing a list of CPUs as an external interface would be better,
then user can pick up which CPUs they wish to add/delete/assign/...
using items from that list.

> Michal
>

Re: [PATCH 5/5] qemu: Prefer -numa cpu over -numa node,cpus=

2021-10-21 Thread Igor Mammedov

On Wed, 20 Oct 2021 13:07:59 +0200
Michal Prívozník  wrote:

> On 10/6/21 3:32 PM, Igor Mammedov wrote:
> > On Thu, 30 Sep 2021 14:08:34 +0200
> > Peter Krempa  wrote:
> >   
> >> On Tue, Sep 21, 2021 at 16:50:31 +0200, Michal Privoznik wrote:  
> >>> QEMU is trying to obsolete -numa node,cpus= because that uses
> >>> ambiguous vCPU id to [socket, die, core, thread] mapping. The new
> >>> form is:
> >>>
> >>>   -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T
> >>>
> >>> which is repeated for every vCPU and places it at [S, D, C, T]
> >>> into guest NUMA node N.
> >>>
> >>> While in general this is magic mapping, we can deal with it.
> >>> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology
> >>> is given then maxvcpus must be sockets * dies * cores * threads
> >>> (i.e. there are no 'holes').
> >>> Secondly, if no topology is given then libvirt itself places each
> >>> vCPU into a different socket (basically, it fakes topology of:
> >>> [maxvcpus, 1, 1, 1])
> >>> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs
> >>> onto topology, to make sure vCPUs don't start to move around.
> >>
> >> There's a problem with this premise though and unfortunately we don't
> >> seem to have qemuxml2argvtest for it.
> >>
> >> On PPC64, in certain situations the CPU can be configured such that
> >> threads are visible only to VMs. This has substantial impact on how CPUs
> >> are configured using the modern parameters (until now used only for
> >> cpu hotplug purposes, and that's the reason vCPU hotplug has such
> >> complicated incantations when starting the VM).
> >>
> >> In the above situation a CPU with topology of:
> >>  sockets=1, cores=4, threads=8 (thus 32 cpus)
> >>
> >> will only expose 4 CPU "devices".
> >>
> >>  core-id: 0,  core-id: 8, core-id: 16 and core-id: 24
> >>
> >> yet the guest will correctly see 32 cpus when used as such.
> >>
> >> You can see this in:
> >>
> >> tests/qemuhotplugtestcpus/ppc64-modern-individual-monitor.json
> >>
> >> Also note that the 'props' object does _not_ have any socket-id, and
> >> management apps are supposed to pass in 'props' as is. (There's a bunch
> >> of code to do that on hotplug).
> >>
> >> The problem is that you need to query the topology first (unless we want
> >> to duplicate all of qemu code that has to do with topology state and
> >> keep up with changes to it) to know how it's behaving on current
> >> machine.  This historically was not possible. The supposed solution for
> >> this was the pre-config state where we'd be able to query and set it up
> >> via QMP, but I was not keeping up sufficiently with that work, so I
> >> don't know if it's possible.
> >>
> >> If preconfig is a viable option we IMO should start using it sooner
> >> rather than later and avoid duplicating qemu's logic here.  
> > 
> > using preconfig is preferable variant otherwise libvirt
> > would end up duplicating topology logic which differs not only
> > between targets but also between machine/cpu types.
> > 
> > Closest example how to use preconfig is in pc_dynamic_cpu_cfg()
> > test case. Though it uses query-hotpluggable-cpus only for
> > verification, but one can use the command at the preconfig
> > stage to get topology for given -smp/-machine type combination.  
> 
> Alright, -preconfig should be pretty easy. However, I do have some
> points to raise/ask:
> 
> 1) currently, exit-preconfig is marked as experimental (hence its "x-"
> prefix). Before libvirt consumes it, QEMU should make it stable. Is
> there anything that stops QEMU from doing so or is it just a matter of
> sending patches (I volunteer to do that)?

if I recall correctly, it was made experimental due to lack of
actual users (it was supposed that libvirt would consume it
once available but it didn't happen for quite a long time).

So patches to make it stable interface should be fine.

> 
> 2) In my experiments I try to mimic what libvirt does. Here's my cmd
> line:
> 
> qemu-system-x86_64 \
> -S \
> -preconfig \
> -cpu host \
> -smp 120,sockets=2,dies=3,cores=4,threads=5 \
> -object 
> '{"qom-type":"memory-backend-memfd","id":"ram-node0","

Re: [PATCH 4/5] qemuBuildNumaCommandLine: Separate out building of CPU list

2021-10-06 Thread Igor Mammedov

On Thu, 30 Sep 2021 13:33:24 +0200
Peter Krempa  wrote:

> On Tue, Sep 21, 2021 at 16:50:30 +0200, Michal Privoznik wrote:
> > Signed-off-by: Michal Privoznik 
> > ---
> >  src/qemu/qemu_command.c | 43 ++---
> >  1 file changed, 27 insertions(+), 16 deletions(-)  
> 
> Reviewed-by: Michal Privoznik 
   ^^^ copy past err :)

Re: [PATCH 5/5] qemu: Prefer -numa cpu over -numa node,cpus=

2021-10-06 Thread Igor Mammedov

On Thu, 30 Sep 2021 14:08:34 +0200
Peter Krempa  wrote:

> On Tue, Sep 21, 2021 at 16:50:31 +0200, Michal Privoznik wrote:
> > QEMU is trying to obsolete -numa node,cpus= because that uses
> > ambiguous vCPU id to [socket, die, core, thread] mapping. The new
> > form is:
> > 
> >   -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T
> > 
> > which is repeated for every vCPU and places it at [S, D, C, T]
> > into guest NUMA node N.
> > 
> > While in general this is magic mapping, we can deal with it.
> > Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology
> > is given then maxvcpus must be sockets * dies * cores * threads
> > (i.e. there are no 'holes').
> > Secondly, if no topology is given then libvirt itself places each
> > vCPU into a different socket (basically, it fakes topology of:
> > [maxvcpus, 1, 1, 1])
> > Thirdly, we can copy whatever QEMU is doing when mapping vCPUs
> > onto topology, to make sure vCPUs don't start to move around.  
> 
> There's a problem with this premise though and unfortunately we don't
> seem to have qemuxml2argvtest for it.
> 
> On PPC64, in certain situations the CPU can be configured such that
> threads are visible only to VMs. This has substantial impact on how CPUs
> are configured using the modern parameters (until now used only for
> cpu hotplug purposes, and that's the reason vCPU hotplug has such
> complicated incantations when starting the VM).
> 
> In the above situation a CPU with topology of:
>  sockets=1, cores=4, threads=8 (thus 32 cpus)
> 
> will only expose 4 CPU "devices".
> 
>  core-id: 0,  core-id: 8, core-id: 16 and core-id: 24
> 
> yet the guest will correctly see 32 cpus when used as such.
> 
> You can see this in:
> 
> tests/qemuhotplugtestcpus/ppc64-modern-individual-monitor.json
> 
> Also note that the 'props' object does _not_ have any socket-id, and
> management apps are supposed to pass in 'props' as is. (There's a bunch
> of code to do that on hotplug).
> 
> The problem is that you need to query the topology first (unless we want
> to duplicate all of qemu code that has to do with topology state and
> keep up with changes to it) to know how it's behaving on current
> machine.  This historically was not possible. The supposed solution for
> this was the pre-config state where we'd be able to query and set it up
> via QMP, but I was not keeping up sufficiently with that work, so I
> don't know if it's possible.
> 
> If preconfig is a viable option we IMO should start using it sooner
> rather than later and avoid duplicating qemu's logic here.

using preconfig is preferable variant otherwise libvirt
would end up duplicating topology logic which differs not only
between targets but also between machine/cpu types.

Closest example how to use preconfig is in pc_dynamic_cpu_cfg()
test case. Though it uses query-hotpluggable-cpus only for
verification, but one can use the command at the preconfig
stage to get topology for given -smp/-machine type combination.

> 
> > Note, migration from old to new cmd line works and therefore
> > doesn't need any special handling.
> > 
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085
> > Signed-off-by: Michal Privoznik 
> > ---
> >  src/qemu/qemu_command.c   | 112 +-
> >  .../hugepages-nvdimm.x86_64-latest.args   |   4 +-
> >  ...memory-default-hugepage.x86_64-latest.args |  10 +-
> >  .../memfd-memory-numa.x86_64-latest.args  |  10 +-
> >  ...y-hotplug-nvdimm-access.x86_64-latest.args |   4 +-
> >  ...ory-hotplug-nvdimm-align.x86_64-5.2.0.args |   4 +-
> >  ...ry-hotplug-nvdimm-align.x86_64-latest.args |   4 +-
> >  ...ory-hotplug-nvdimm-label.x86_64-5.2.0.args |   4 +-
> >  ...ry-hotplug-nvdimm-label.x86_64-latest.args |   4 +-
> >  ...mory-hotplug-nvdimm-pmem.x86_64-5.2.0.args |   4 +-
> >  ...ory-hotplug-nvdimm-pmem.x86_64-latest.args |   4 +-
> >  ...-hotplug-nvdimm-readonly.x86_64-5.2.0.args |   4 +-
> >  ...hotplug-nvdimm-readonly.x86_64-latest.args |   4 +-
> >  .../memory-hotplug-nvdimm.x86_64-latest.args  |   4 +-
> >  ...mory-hotplug-virtio-pmem.x86_64-5.2.0.args |   4 +-
> >  ...ory-hotplug-virtio-pmem.x86_64-latest.args |   4 +-
> >  .../numatune-hmat.x86_64-latest.args  |  18 ++-
> >  ...emnode-restrictive-mode.x86_64-latest.args |  38 +-
> >  .../numatune-memnode.x86_64-5.2.0.args|  38 +-
> >  .../numatune-memnode.x86_64-latest.args   |  38 +-
> >  ...vhost-user-fs-fd-memory.x86_64-latest.args |   4 +-
> >  ...vhost-user-fs-hugepages.x86_64-latest.args |   4 +-
> >  ...host-user-gpu-secondary.x86_64-latest.args |   3 +-
> >  .../vhost-user-vga.x86_64-latest.args |   3 +-
> >  24 files changed, 296 insertions(+), 34 deletions(-)
> > 
> > diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> > index f04ae1e311..5192bd7630 100644
> > --- a/src/qemu/qemu_command.c
> > +++ b/src/qemu/qemu_command.c  
> 
> [...]
> 
> > @@ -7432,6 +7432,94 @@ qemuBuildNumaCPUs(virBuffer *buf,
>

Re: [PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and acpi-root-hotplug pm options

2021-09-29 Thread Igor Mammedov

On Tue, 28 Sep 2021 11:47:26 +0100
Daniel P. Berrangé  wrote:

> On Tue, Sep 28, 2021 at 03:28:04PM +0530, Ani Sinha wrote:
> > 
> > 
> > On Tue, 28 Sep 2021, Daniel P. Berrangé wrote:
> >   
> > > On Tue, Sep 28, 2021 at 02:35:47PM +0530, Ani Sinha wrote:  
> > > >
> > > >
> > > > On Tue, 28 Sep 2021, Daniel P. Berrangé wrote:
> > > >  
> > > > > On Sun, Sep 12, 2021 at 08:56:29AM +0530, Ani Sinha wrote:  
> > > > > > This change introduces libvirt xml support for the following two pm 
> > > > > > options:
[...]
> > > > The  switch in libvirt for pcie-root-ports
> > > > currently does not care whether native or acpi hotplug is used. It 
> > > > simply
> > > > turns on the hotplug for that particular port. Whether ACPI or native is
> > > > used is controlled by this global flag that Julia has introduced in 
> > > > 6.1.  

> > > Right so we have
> > >

*1*)
following applies to piix4/q35:
  * ACPI hotplug when enabled, affects _only_ cold-plugged 'bridges'
since it requires 'slots' being described in DSDT table which
in current impl. is static table built at reset time.

 (i.e. built-in or 'bridges' specified on command line,
  where 'bridges' could be PCI-PCI or PCIe-PCI or root/downstream-ports')
   for anything else ('bridges' added with device_add) native hotplug
   is in use (whether it's SHPC or PCI-E native).

ACPI hotplug wiring is done by calling qbus_set_hotplug_handler()
 * for root bus piix4_pm_realize()/ich9_pm_init()
 * for anything else acpi_pcihp_device_plug_cb()

> > >   * PIIX4
> > >
> > >   - acpi-root-pci-hotplug=bool
> > >
> > > Whether hotplug is enabled for the root bridge or not
> > >
> > >for pci-root controller
> > >
> > >
> > >   - acpi-pci-hotplug-with-bridge-support=bool
> > >
> > > Toggles support for ACPI based hotplug across all bridges.
> > >   If disabled will there will be no hotplug at all for PIIX4 ?
> > >   Or does 'shpc' come into play in that scenario ?

'SHPC' hotplug kicks in if it's enabled. (defaults to 'on' except 2.9 machine 
type)

on q35/APCI side of things we always advertise -all_ hotplug methods available

build_q35_osc_method()
/*  

 * Always allow native PME, AER (no dependencies)   

 * Allow SHPC (PCI bridges can have SHPC controller)

 */ 

aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl));

bits 0, 1 are Native PCI-E hotplug and SHPC respectively 

for PIIX4 we don't have _OSC so it's up to guest OS to make up
supported methods.

In order of preference:
  * Windows supports ACPI hotplug then Native PCI-E (SHPC never worked there)
  * Linux supports ACPI hotplug, SHPC, Native PCI-E
(SHPC worked poorly due to need to reserve IO for bridges
 io reservation hinting (impl. later by Marcel))

> > >PIIX combinations
> > >
> > >(1) acpi-root-pci-hotplug=yes
> > >acpi-pci-hotplug-with-bridge-support=yes
> > >
> > >  - All bridges have hotplug
> > >
> > >(2) acpi-root-pci-hotplug=yes
> > >acpi-pci-hotplug-with-bridge-support=no
> > >
> > >  - No bridges have hotplug
> > >
> > >(3) acpi-root-pci-hotplug=no
> > >acpi-pci-hotplug-with-bridge-support=yes
> > >
> > >  - All bridges except root have hotplug

requested by Promox guys, to battle against Windows 'feature' that
lets any user to unplug sole NIC using an icon on taskbar.

(Laine mentioned we have similar per port control for PCI-E
('hotplug' property) that was requested by other users
probably for the same reason)

so acpi-root-pci-hotplug is similar to pcie-root-port.hotplug
with a difference that the former applies to whole root bus
on PIIX4, where the later could be controlled per root port.

> > >(4) acpi-root-pci-hotplug=no
> > >acpi-pci-hotplug-with-bridge-support=no
> > >
> > >  - No bridges have hotplug. Essentially identical to (2)  
> > 
> > no (4) is not identical to (2). In (4) no hotplug is enabled. In (2) pci
> > root bus still has hotplug enabled.  
> 
> So you're saying that acpi-root-pci-hotplug=yes overrides the
> global request acpi-pci-hotplug-with-bridge-support=no and
> turns ACPI hotplug back on for the pcie-root

historically ACPI hotplug on root bus was always supported
without any option, i.e. acpi-root-pci-hotplug=yes by default.
acpi-pci-hotplug-with-bridge-support does what its name
claims - i.e. adds hotplug for bridges (at least on PIIX4).

> > >   * Q35

 clarification [*1*] still applies

> > >
> > >
> > >   - acpi-pci-hotplug-with-bridge-support=bool
> > >
> > > Toggles support for ACPI based hotplug. If disabled native
> > >   PCIe hotplug is activated instead
> > >
> > >
> > >   * pcie-root-port
> > >
> > >   - hotplug=bool
> > >
> > > Toggle

Re: [PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and acpi-root-hotplug pm options

2021-09-29 Thread Igor Mammedov

On Tue, 28 Sep 2021 11:59:42 +0100
Daniel P. Berrangé  wrote:

> On Tue, Sep 28, 2021 at 11:47:26AM +0100, Daniel P. Berrangé wrote:
> > On Tue, Sep 28, 2021 at 03:28:04PM +0530, Ani Sinha wrote:  
> > > 
> > > 
> > > On Tue, 28 Sep 2021, Daniel P. Berrangé wrote:
> > >   
> > > > On Tue, Sep 28, 2021 at 02:35:47PM +0530, Ani Sinha wrote:  
> > > > >
> > > > >
> > > > > On Tue, 28 Sep 2021, Daniel P. Berrangé wrote:
> > > > >  
> > > > > > On Sun, Sep 12, 2021 at 08:56:29AM +0530, Ani Sinha wrote:  
> > > > > > > This change introduces libvirt xml support for the following two 
> > > > > > > pm options:
> > > > > > >
> > > > > > > 
> > > > > > >   
> > > > > > >   
> > > > > > >   
> > > > > >
> > > > > >  
> > > > > > > +``acpi-hotplug-bridge``
> > > > > > > +   :since:`Since 7.8.0` This option enables or disables BIOS 
> > > > > > > ACPI based hotplug support
> > > > > > > +   for cold plugged bridges. It is available only for x86 
> > > > > > > guests, both for q35 and pc
> > > > > > > +   machine types. For pc machines, the support is available from 
> > > > > > > `QEMU 2.12`. For q35
> > > > > > > +   machines, the support is available from `QEMU 6.1`. Examples 
> > > > > > > of cold plugged bridges
> > > > > > > +   include PCI-PCI bridges for pc machine types (pci-bridge 
> > > > > > > controller). For q35 machines,
> > > > > > > +   it includes PCIE root ports (pcie-root-port controller). This 
> > > > > > > is a global option that
> > > > > > > +   affects all bridges. No other bridge specific option is 
> > > > > > > required to be specified.  
> > > > > >
> > > > > > Can you confirm my understanding of the situation..
> > > > > >
> > > > > >  - i440fx / PCI topology - hotplug always uses ACPI
> > > > > >  
> > > > >
> > > > > ACPI is the primary means of enabling hotplug. shpc might also have a 
> > > > > role
> > > > > here but I think it is disabled. Igor (cc'd) might throw some lights 
> > > > > on
> > > > > how shpc comes to play.  
> > > >
> > > > Yes, I think it will be important to understand if 'shpc' becomes 
> > > > relevant
> > > > when ACPI hotplug is disabled for PCI
> > > >  
> > > > >  
> > > > > >  - q35 / PCIe topology - hotplug historically used native PCIe 
> > > > > > hotplug,
> > > > > >  but in 6.1 switched to ACPI
> > > > > >  
> > > > >
> > > > > Correct.
> > > > >  
> > > > > > Given, the name "acpi-hotplug-bridge",  am I right that this option
> > > > > > has *no* effect, if the q35 machine is using native PCIe hotplug
> > > > > > approach ?  
> > > > >
> > > > > Its complicated.
> > > > > With "acpi-hotplug-bridge" ON, native hotplug is disabled in qemu.
> > > > > With "acpi-hotplug-bridge" OFF, native hotplug is enabled in qemu.  
> > > >
> > > > Oh, I mis-read and didn't realize this was controlling the QEMU
> > > > "acpi-pci-hotplug-with-bridge-support" configuration.
> > > >
> > > > With this in mind I think the naming is somewhat misleading. Setting it
> > > > to off would give users the impression that hotplug is disabled, which
> > > > is not the case for Q35 at least. It is just switching to a different
> > > > hotplug implementation.
> > > >
> > > > At least from Q35 pov, I think it would be better to call it
> > > >
> > > > hotplug-mode="acpi|pcie"
> > > >
> > > > so it is clear that no matter what value it is set to, hotplug
> > > > is still available.
> > > >
> > > > If we also consider PIIX, then depending on the answer wrt shpc
> > > > above, we might want one of
> > > >
> > > > hotplug-mode="acpi|pcie|none"
> > > > hotplug-mode="acpi|pcie|shpc"
> > > >  
> > > 
> > > If libvirt does not deal with shpc today I think we should not bother with
> > > shpc at all. We should simply have a boolean mode appropriately named that
> > > choses between acpi hotplug vs native.  
> > 
> > I want to understand what's possible at the qemu hardware level,
> > so we don't paint ourselves into a corner.
> > 
> > IIUC, with shpc we only have a toggle on "pci-bridge" devices,
> > and those currently have shpc=true by default. There's no shpc
> > setting on the pci-root, and theres no global setting.  
> 
> Opps, I was mislead. They have shpc=false by default due to machine
> types >= 2.9 overriding it to false


If I read it correctly, shcp is on by default
(modulo 2.9 see 2fa356629ed2)

> 
> > Seems to imply that if we have acpi-hotplug disabled for PIIX,
> > then there would be no hotplug on the pci-root, but shpc hotplug
> > would still be available on any pci-bridge devices ?  
> 
> Regards,
> Daniel

Re: [PATCH] Deprecate pmem=on with non-DAX capable backend file

2021-05-03 Thread Igor Mammedov

On Wed, 28 Apr 2021 12:29:30 -0400
Eduardo Habkost  wrote:

> On Tue, Apr 27, 2021 at 04:48:48PM -0400, Eduardo Habkost wrote:
> > On Mon, Jan 11, 2021 at 03:33:32PM -0500, Igor Mammedov wrote:  
> > > It is not safe to pretend that emulated NVDIMM supports
> > > persistence while backend actually failed to enable it
> > > and used non-persistent mapping as fall back.
> > > Instead of falling-back, QEMU should be more strict and
> > > error out with clear message that it's not supported.
> > > So if user asks for persistence (pmem=on), they should
> > > store backing file on NVDIMM.
> > > 
> > > Signed-off-by: Igor Mammedov 
> > > Reviewed-by: Philippe Mathieu-Daudé   
> > 
> > I'm queueing this for 6.1, after changing "since 6.0" to "since 6.1".
> > 
> > Sorry for letting it fall through the cracks.  
> 
> This caused build failures[1] and I had to apply the following
> fixup.
Thanks!

> 
> [1] https://gitlab.com/ehabkost/qemu/-/jobs/1216917482#L3444
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  docs/system/deprecated.rst | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> index cc8d810be1a..c55c4bceb00 100644
> --- a/docs/system/deprecated.rst
> +++ b/docs/system/deprecated.rst
> @@ -257,6 +257,7 @@ is (a) not DAX capable or (b) not on a filesystem that 
> support direct mapping
>  of persistent memory, is not safe and may lead to data loss or corruption in 
> case
>  of host crash.
>  Options are:
> +
>  - modify VM configuration to set ``pmem=off`` to continue using fake 
> NVDIMM
>(without persistence guaranties) with backing file on non DAX storage
>  - move backing file to NVDIMM storage and keep ``pmem=on``

Re: [libvirt PATCH 1/6] conf: add support for for PCI devices

2021-04-08 Thread Igor Mammedov

On Thu, 8 Apr 2021 09:39:43 +0100
Daniel P. Berrangé  wrote:

> On Wed, Apr 07, 2021 at 10:23:37PM +0200, Igor Mammedov wrote:
> > On Wed, 7 Apr 2021 13:40:03 +0100
> > Daniel P. Berrangé  wrote:
> >   
> > > On Wed, Apr 07, 2021 at 09:17:36AM +0200, Peter Krempa wrote:  
> > > > On Tue, Apr 06, 2021 at 16:31:32 +0100, Daniel Berrange wrote:
> > > > > PCI devices can be associated with a unique integer index that is
> > > > > exposed via ACPI. In Linux OS with systemd, this value is used for
> > > > > provide a NIC device naming scheme that is stable across changes
> > > > > in PCI slot configuration.
> > > > > 
> > > > > Signed-off-by: Daniel P. Berrangé 
> > > > > ---
> > > > >  docs/formatdomain.rst |  6 +++
> > > > >  docs/schemas/domaincommon.rng | 73 
> > > > > +++
> > > > >  src/conf/device_conf.h|  3 ++
> > > > >  src/conf/domain_conf.c| 12 ++
> > > > >  4 files changed, 94 insertions(+)
> > > > > 
> > > > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> > > > > index 7ba32ea9c1..5db0aac77a 100644
> > > > > --- a/docs/formatdomain.rst
> > > > > +++ b/docs/formatdomain.rst
> > > > > @@ -4363,6 +4363,7 @@ Network interfaces
> > > > > 
> > > > > 
> > > > > 
> > > > > +   
> > > > >   
> > > > > 
> > > > > ...
> > > > > @@ -4389,6 +4390,11 @@ when it's in the reserved VMware range by 
> > > > > adding a ``type="static"`` attribute
> > > > >  to the  element. Note that this attribute is useless if 
> > > > > the provided
> > > > >  MAC address is outside of the reserved VMWare ranges.
> > > > >  
> > > > > +:since:`Since 7.3.0`, one can set the ACPI index against network 
> > > > > interfaces.
> > > > > +With some operating systems (eg Linux with systemd), the ACPI index 
> > > > > is used
> > > > > +to provide network interface device naming, that is stable across 
> > > > > changes
> > > > > +in PCI addresses assigned to the device.
> > > > 
> > > > Any range limits or uniqueness requirements worth mentioning?
> > > 
> > > Yes, its required to be unique and below (16 * 1024 - 1) because
> > > for some reason QEMU chose to artificially limit its value to
> > > match systemd's limit. This is a bit dubious IMHO, as the host
> > > should not enforce policy for things that are decided by the
> > > guest.  
> > dropping limit would just postpone error till guest boots
> > with effect that 'oboard' naming won't be used and systemd
> > will fallback to the next available method.  
> 
> That's no big deal - the user will easily see this and change their
> config. It is a mere docs problem at most.
> 
> > Given that systemd is the sole known user of this feature,
> > it seemed better to me to error out at QEMU start rather than
> > waiting till guests boots and let user figure out what's wrong.
> > 
> > If we find another user for the feature that supports full range
> > we can drop limit easily without any compat issues.  
> 
> There must be other users of this feature, given that we're using
> a facility that is part of a formal ACPI specification that existed
> before systemd had this feature. Given that I think it is very
> bad practice to apply a limit host side that's tied to a single
> guest usecase, regardless of whether we happen to know about the
> other users. We're basically creating a bug in QEMU upfront that
> doesn't need to exist.

Ok, I'll post a patch to remove limit once 6.1 dev window is open.

> 
> Regards,
> Daniel

Re: [libvirt PATCH 1/6] conf: add support for for PCI devices

2021-04-07 Thread Igor Mammedov

On Wed, 7 Apr 2021 13:40:03 +0100
Daniel P. Berrangé  wrote:

> On Wed, Apr 07, 2021 at 09:17:36AM +0200, Peter Krempa wrote:
> > On Tue, Apr 06, 2021 at 16:31:32 +0100, Daniel Berrange wrote:  
> > > PCI devices can be associated with a unique integer index that is
> > > exposed via ACPI. In Linux OS with systemd, this value is used for
> > > provide a NIC device naming scheme that is stable across changes
> > > in PCI slot configuration.
> > > 
> > > Signed-off-by: Daniel P. Berrangé 
> > > ---
> > >  docs/formatdomain.rst |  6 +++
> > >  docs/schemas/domaincommon.rng | 73 +++
> > >  src/conf/device_conf.h|  3 ++
> > >  src/conf/domain_conf.c| 12 ++
> > >  4 files changed, 94 insertions(+)
> > > 
> > > diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> > > index 7ba32ea9c1..5db0aac77a 100644
> > > --- a/docs/formatdomain.rst
> > > +++ b/docs/formatdomain.rst
> > > @@ -4363,6 +4363,7 @@ Network interfaces
> > > 
> > > 
> > > 
> > > +   
> > >   
> > > 
> > > ...
> > > @@ -4389,6 +4390,11 @@ when it's in the reserved VMware range by adding a 
> > > ``type="static"`` attribute
> > >  to the  element. Note that this attribute is useless if the 
> > > provided
> > >  MAC address is outside of the reserved VMWare ranges.
> > >  
> > > +:since:`Since 7.3.0`, one can set the ACPI index against network 
> > > interfaces.
> > > +With some operating systems (eg Linux with systemd), the ACPI index is 
> > > used
> > > +to provide network interface device naming, that is stable across changes
> > > +in PCI addresses assigned to the device.  
> > 
> > Any range limits or uniqueness requirements worth mentioning?  
> 
> Yes, its required to be unique and below (16 * 1024 - 1) because
> for some reason QEMU chose to artificially limit its value to
> match systemd's limit. This is a bit dubious IMHO, as the host
> should not enforce policy for things that are decided by the
> guest.
dropping limit would just postpone error till guest boots
with effect that 'oboard' naming won't be used and systemd
will fallback to the next available method.

Given that systemd is the sole known user of this feature,
it seemed better to me to error out at QEMU start rather than
waiting till guests boots and let user figure out what's wrong.

If we find another user for the feature that supports full range
we can drop limit easily without any compat issues.

> 
> 
> Regards,
> Daniel

Re: [libvirt PATCH 5/6] qemu: probe for "acpi-index" property

2021-04-06 Thread Igor Mammedov

On Tue,  6 Apr 2021 16:31:36 +0100
Daniel P. Berrangé  wrote:

> This property is exposed by QEMU on any PCI device, but we have to pick
> some specific device(s) to probe it against. We expect that at least one
> of the virtio devices will be present, so probe against them.

Would it be useful to expose capability with MachineInfo in QAPI schema?

At least with this on QEMU side I can imagine a crude check and error out
in case device has acpi-index set but machine doesn't support it.


> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  src/qemu/qemu_capabilities.c | 8 
>  src/qemu/qemu_capabilities.h | 3 +++
>  tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 +
>  3 files changed, 12 insertions(+)
> 
> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
> index ea24e2d6a5..f44a06c5c9 100644
> --- a/src/qemu/qemu_capabilities.c
> +++ b/src/qemu/qemu_capabilities.c
> @@ -625,6 +625,9 @@ VIR_ENUM_IMPL(virQEMUCaps,
>"blockdev-backup",
>"object.qapified",
>"rotation-rate",
> +
> +  /* 400 */
> +  "acpi-index",
>  );
>  
>  
> @@ -1363,6 +1366,7 @@ static struct virQEMUCapsDevicePropsFlags 
> virQEMUCapsDevicePropsVirtioBalloon[]
>  { "ats", QEMU_CAPS_VIRTIO_PCI_ATS, NULL },
>  { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL },
>  { "free-page-reporting", QEMU_CAPS_VIRTIO_BALLOON_FREE_PAGE_REPORTING, 
> NULL },
> +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL },
>  };
>  
>  
> @@ -1395,6 +1399,7 @@ static struct virQEMUCapsDevicePropsFlags 
> virQEMUCapsDevicePropsVirtioBlk[] = {
>  { "write-cache", QEMU_CAPS_DISK_WRITE_CACHE, NULL },
>  { "werror", QEMU_CAPS_STORAGE_WERROR, NULL },
>  { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL },
> +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL },
>  };
>  
>  static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVirtioNet[] 
> = {
> @@ -1408,6 +1413,7 @@ static struct virQEMUCapsDevicePropsFlags 
> virQEMUCapsDevicePropsVirtioNet[] = {
>  { "ats", QEMU_CAPS_VIRTIO_PCI_ATS, NULL },
>  { "failover", QEMU_CAPS_VIRTIO_NET_FAILOVER, NULL },
>  { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL },
> +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL },
>  };
>  
>  static struct virQEMUCapsDevicePropsFlags 
> virQEMUCapsDevicePropsPCIeRootPort[] = {
> @@ -1428,6 +1434,7 @@ static struct virQEMUCapsDevicePropsFlags 
> virQEMUCapsDevicePropsVirtioSCSI[] = {
>  { "iommu_platform", QEMU_CAPS_VIRTIO_PCI_IOMMU_PLATFORM, NULL },
>  { "ats", QEMU_CAPS_VIRTIO_PCI_ATS, NULL },
>  { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL },
> +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL },
>  };
>  
>  static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsVfioPCI[] = {
> @@ -1499,6 +1506,7 @@ static struct virQEMUCapsDevicePropsFlags 
> virQEMUCapsDevicePropsVirtioGpu[] = {
>  { "iommu_platform", QEMU_CAPS_VIRTIO_PCI_IOMMU_PLATFORM, NULL },
>  { "ats", QEMU_CAPS_VIRTIO_PCI_ATS, NULL },
>  { "packed", QEMU_CAPS_VIRTIO_PACKED_QUEUES, NULL },
> +{ "acpi-index", QEMU_CAPS_ACPI_INDEX, NULL },
>  };
>  
>  static struct virQEMUCapsDevicePropsFlags virQEMUCapsDevicePropsICH9[] = {
> diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
> index a70c00a265..22ff3a2f15 100644
> --- a/src/qemu/qemu_capabilities.h
> +++ b/src/qemu/qemu_capabilities.h
> @@ -606,6 +606,9 @@ typedef enum { /* virQEMUCapsFlags grouping marker for 
> syntax-check */
>  QEMU_CAPS_OBJECT_QAPIFIED, /* parameters for object-add are formally 
> described */
>  QEMU_CAPS_ROTATION_RATE, /* scsi-disk / ide-drive rotation-rate prop */
>  
> +/* 400 */
> +QEMU_CAPS_ACPI_INDEX, /* PCI device 'acpi-index' property */
> +
>  QEMU_CAPS_LAST /* this must always be the last item */
>  } virQEMUCapsFlags;
>  
> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml 
> b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml
> index 984a2d5896..592560c3ef 100644
> --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml
> +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml
> @@ -261,6 +261,7 @@
>
>
>
> +  
>5002091
>0
>43100242

Re: Ways to deal with broken machine types

2021-03-29 Thread Igor Mammedov

On Mon, 29 Mar 2021 15:46:53 +0100
"Dr. David Alan Gilbert"  wrote:

> * Igor Mammedov (imamm...@redhat.com) wrote:
> > On Tue, 23 Mar 2021 17:40:36 +
> > Daniel P. Berrangé  wrote:
> >   
> > > On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote:  
> > > > Let me hijack this thread for beyond this case scope.
> > > > 
> > > > I agree that for this particular bug we've done all we could, but
> > > > there is broader issue to discuss here.
> > > > 
> > > > We have machine versions to deal with hw compatibility issues and that 
> > > > covers most of the cases,
> > > > but occasionally we notice problem well after release(s),
> > > > so users may be stuck with broken VM and need to manually fix 
> > > > configuration (and/or VM).
> > > > Figuring out what's wrong and how to fix it is far from trivial. So 
> > > > lets discuss if we
> > > > can help to ease this pain, yes it will be late for first victims but 
> > > > it's still
> > > > better than never.
> > > 
> > > To summarize the problem situation
> > > 
> > >  - We rely on a machine type version to encode a precise guest ABI.
> > >  - Due a bug, we are in a situation where the same machine type
> > >encodes two distinct guest ABIs due to a mistake introduced
> > >betwen QEMU N-2 and N-1
> > >  - We want to fix the bug in QEMU N
> > >  - For incoming migration there is no way to distinguish between
> > >the ABIs used in N-2 and N-1, to pick the right one
> > > 
> > > So we're left with an unwinnable problem:
> > > 
> > >   - Not fixing the bug =>
> > > 
> > >a) user migrating N-2 to N-1 have ABI change
> > >b) user migrating N-2 to N have ABI change
> > >c) user migrating N-1 to N are fine
> > > 
> > > No mitigation for (a) or (b)
> > > 
> > >   - Fixing the bug =>
> > > 
> > >a) user migrating N-2 to N-1 have ABI change.
> > >b) user migrating N-2 to N are fine
> > >c) user migrating N-1 to N have ABI change
> > > 
> > > Bad situations (a) and (c) are mitigated by
> > > backporting fix to N-1-stable too.
> > > 
> > > Generally we have preferred to fix the bug, because we have
> > > usually identified them fairly quickly after release, and
> > > backporting the fix to stable has been sufficient mitigation
> > > against ill effects. Basically the people left broken are a
> > > relatively small set out of the total userbase.
> > > 
> > > The real challenge arises when we are slow to identify the
> > > problem, such that we have a large number of people impacted.
> > > 
> > >   
> > > > I'll try to sum up idea Michael suggested (here comes my unorganized 
> > > > brain-dump),
> > > > 
> > > > 1. We can keep in VM's config QEMU version it was created on
> > > >and as minimum warn user with a pointer to known issues if version in
> > > >config mismatches version of actually used QEMU, with a knob to 
> > > > silence
> > > >it for particular mismatch.
> > > > 
> > > > When an issue becomes know and resolved we know for sure how and what
> > > > changed and embed instructions on what options to use for fixing up VM's
> > > > config to preserve old HW config depending on QEMU version VM was 
> > > > installed on.
> > >   
> > > > some more ideas:
> > > >2. let mgmt layer to keep fixup list and apply them to config if 
> > > > available
> > > >(user would need to upgrade mgmt or update fixup list somehow)
> > > >3. let mgmt layer to pass VM's QEMU version to currently used QEMU, 
> > > > so
> > > >   that QEMU could maintain and apply fixups based on QEMU version + 
> > > > machine type.
> > > >   The user will have to upgrade to newer QEMU to get/use new 
> > > > fixups.
> > > 
> > > The nice thing about machine type versioning is that we are treating the
> > > versions as opaque strings which represent a specific ABI, regardless of
> > > the QEMU version. This means that even if distros backport fixes for bugs
> > > or even new features, the machine type compatibility check remains a
>

Re: Ways to deal with broken machine types

2021-03-25 Thread Igor Mammedov

On Tue, 23 Mar 2021 17:40:36 +
Daniel P. Berrangé  wrote:

> On Tue, Mar 23, 2021 at 05:54:47PM +0100, Igor Mammedov wrote:
> > Let me hijack this thread for beyond this case scope.
> > 
> > I agree that for this particular bug we've done all we could, but
> > there is broader issue to discuss here.
> > 
> > We have machine versions to deal with hw compatibility issues and that 
> > covers most of the cases,
> > but occasionally we notice problem well after release(s),
> > so users may be stuck with broken VM and need to manually fix configuration 
> > (and/or VM).
> > Figuring out what's wrong and how to fix it is far from trivial. So lets 
> > discuss if we
> > can help to ease this pain, yes it will be late for first victims but it's 
> > still
> > better than never.  
> 
> To summarize the problem situation
> 
>  - We rely on a machine type version to encode a precise guest ABI.
>  - Due a bug, we are in a situation where the same machine type
>encodes two distinct guest ABIs due to a mistake introduced
>betwen QEMU N-2 and N-1
>  - We want to fix the bug in QEMU N
>  - For incoming migration there is no way to distinguish between
>the ABIs used in N-2 and N-1, to pick the right one
> 
> So we're left with an unwinnable problem:
> 
>   - Not fixing the bug =>
> 
>a) user migrating N-2 to N-1 have ABI change
>b) user migrating N-2 to N have ABI change
>c) user migrating N-1 to N are fine
> 
> No mitigation for (a) or (b)
> 
>   - Fixing the bug =>
> 
>a) user migrating N-2 to N-1 have ABI change.
>b) user migrating N-2 to N are fine
>c) user migrating N-1 to N have ABI change
> 
> Bad situations (a) and (c) are mitigated by
> backporting fix to N-1-stable too.
> 
> Generally we have preferred to fix the bug, because we have
> usually identified them fairly quickly after release, and
> backporting the fix to stable has been sufficient mitigation
> against ill effects. Basically the people left broken are a
> relatively small set out of the total userbase.
> 
> The real challenge arises when we are slow to identify the
> problem, such that we have a large number of people impacted.
> 
> 
> > I'll try to sum up idea Michael suggested (here comes my unorganized 
> > brain-dump),
> > 
> > 1. We can keep in VM's config QEMU version it was created on
> >and as minimum warn user with a pointer to known issues if version in
> >config mismatches version of actually used QEMU, with a knob to silence
> >it for particular mismatch.
> > 
> > When an issue becomes know and resolved we know for sure how and what
> > changed and embed instructions on what options to use for fixing up VM's
> > config to preserve old HW config depending on QEMU version VM was installed 
> > on.  
> 
> > some more ideas:
> >2. let mgmt layer to keep fixup list and apply them to config if 
> > available
> >(user would need to upgrade mgmt or update fixup list somehow)
> >3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
> >   that QEMU could maintain and apply fixups based on QEMU version + 
> > machine type.
> >   The user will have to upgrade to newer QEMU to get/use new fixups.  
> 
> The nice thing about machine type versioning is that we are treating the
> versions as opaque strings which represent a specific ABI, regardless of
> the QEMU version. This means that even if distros backport fixes for bugs
> or even new features, the machine type compatibility check remains a
> simple equality comparsion.
> 
> As soon as you introduce the QEMU version though, we have created a
> large matrix for compatibility. This matrix is expanded if a distro
> chooses to backport fixes for any of the machine type bugs to their
> stable streams. This can get particularly expensive when there are
> multiple streams a distro is maintaining.
> 
> *IF* the original N-1 qemu has a property that could be queried by
> the mgmt app to identify a machine type bug, then we could potentially
> apply a fixup automatically.
> 
> eg query-machines command in QEMU version N could report against
> "pc-i440fx-5.0", that there was a regression fix that has to be
> applied if property "foo" had value "bar".
> 
> Now, the mgmt app wants to migrate from QEMU N-2 or N-1 to QEMU N.
> It can query the value of "foo" on the source QEMU with qom-get.
> It now knows whether it has to override this property "foo" when
> spawning QEMU N on the tar

Ways to deal with broken machine types

2021-03-23 Thread Igor Mammedov

On Tue, 23 Mar 2021 16:04:11 +0100
Thomas Lamprecht  wrote:

> On 23.03.21 15:55, Vitaly Cheptsov wrote:
> >> 23 марта 2021 г., в 17:48, Michael S. Tsirkin  написал(а):
> >>
> >> The issue is with people who installed a VM using 5.1 qemu,
> >> migrated to 5.2, booted there and set a config on a device
> >> e.g. IP on a NIC.
> >> They now have a 5.1 machine type but changing uid back
> >> like we do will break these VMs.
> >>
> >> Unlikley to be common but let's at least create a way for these people
> >> to used these VMs.
> >>  
> > They can simply set the 5.2 VM version in such a case. I do not want to   
> let this legacy hack to be enabled in any modern QEMU VM version, as it 
> violates ACPI specification and makes the life more difficult for various 
> other software like bootloaders and operating systems.
> 
> Yeah here I agree with Vitaly, if they already used 5.2 and made some 
> configurations
> for those "new" devices they can just keep using 5.2?
> 
> If some of the devices got configured on 5.1 and some on 5.2 there's nothing 
> we can
> do anyway, from a QEMU POV - there the user always need to choose one machine 
> version
> and fix up the device configured while on the other machine.

According to testing it appears that issue affects virtio drivers so it may 
lead to
failure to boot guest (and there was at least 1 report about virtio-scsi being 
affected).

Let me hijack this thread for beyond this case scope.

I agree that for this particular bug we've done all we could, but
there is broader issue to discuss here.

We have machine versions to deal with hw compatibility issues and that covers 
most of the cases,
but occasionally we notice problem well after release(s),
so users may be stuck with broken VM and need to manually fix configuration 
(and/or VM).
Figuring out what's wrong and how to fix it is far from trivial. So lets 
discuss if we
can help to ease this pain, yes it will be late for first victims but it's still
better than never.

I'll try to sum up idea Michael suggested (here comes my unorganized 
brain-dump),

1. We can keep in VM's config QEMU version it was created on
   and as minimum warn user with a pointer to known issues if version in
   config mismatches version of actually used QEMU, with a knob to silence
   it for particular mismatch.

When an issue becomes know and resolved we know for sure how and what
changed and embed instructions on what options to use for fixing up VM's
config to preserve old HW config depending on QEMU version VM was installed on.

some more ideas:
   2. let mgmt layer to keep fixup list and apply them to config if available
   (user would need to upgrade mgmt or update fixup list somehow)
   3. let mgmt layer to pass VM's QEMU version to currently used QEMU, so
  that QEMU could maintain and apply fixups based on QEMU version + machine 
type.
  The user will have to upgrade to newer QEMU to get/use new fixups.

In my opinion both would lead to explosion of 'possibly needed' properties for 
each
change we introduce in hw/firmware(read ACPI) and very possibly a lot of 
conditional
branches in QEMU code. And I'm afraid it will become hard to maintain QEMU =>
more bugs in future.
Also it will lead to explosion of test matrix for downstreams who care about 
testing.

If we proactively gate changes on properties, we can just update fixup lists in 
mgmt,
without need to update QEMU (aka Insite rules) at a cost of complexity on QMEU 
side.

Alternatively we can be conservative in spawning new properties, that means 
creating
them only when issue is fixed and require users to update QEMU, so that fixups 
could
be applied to VM.

Feel free to shoot the messenger down or suggest ways how we can deal with the 
problem.

Re: [PATCH] Deprecate pmem=on with non-DAX capable backend file

2021-01-20 Thread Igor Mammedov

On Mon, 11 Jan 2021 15:33:32 -0500
Igor Mammedov  wrote:

> It is not safe to pretend that emulated NVDIMM supports
> persistence while backend actually failed to enable it
> and used non-persistent mapping as fall back.
> Instead of falling-back, QEMU should be more strict and
> error out with clear message that it's not supported.
> So if user asks for persistence (pmem=on), they should
> store backing file on NVDIMM.
> 
> Signed-off-by: Igor Mammedov 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
> v2:
>   rephrase deprecation comment andwarning message
>   (Philippe Mathieu-Daudé )

I've posted as v1 though it's v2 and it looks like it fell through cracks,

can someone pick it up if it looks fine, please?

> ---
>  docs/system/deprecated.rst | 17 +
>  util/mmap-alloc.c  |  3 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> index bacd76d7a5..e79fb02b3a 100644
> --- a/docs/system/deprecated.rst
> +++ b/docs/system/deprecated.rst
> @@ -327,6 +327,23 @@ The Raspberry Pi machines come in various models (A, A+, 
> B, B+). To be able
>  to distinguish which model QEMU is implementing, the ``raspi2`` and 
> ``raspi3``
>  machines have been renamed ``raspi2b`` and ``raspi3b``.
>  
> +Backend options
> +---
> +
> +Using non-persistent backing file with pmem=on (since 6.0)
> +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
> +
> +This option is used when ``memory-backend-file`` is consumed by emulated 
> NVDIMM
> +device. However enabling ``memory-backend-file.pmem`` option, when backing 
> file
> +is (a) not DAX capable or (b) not on a filesystem that support direct mapping
> +of persistent memory, is not safe and may lead to data loss or corruption in 
> case
> +of host crash.
> +Options are:
> +- modify VM configuration to set ``pmem=off`` to continue using fake 
> NVDIMM
> +  (without persistence guaranties) with backing file on non DAX storage
> +- move backing file to NVDIMM storage and keep ``pmem=on``
> +  (to have NVDIMM with persistence guaranties).
> +
>  Device options
>  --
>  
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index 27dcccd8ec..0388cc3be2 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -20,6 +20,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/mmap-alloc.h"
>  #include "qemu/host-utils.h"
> +#include "qemu/error-report.h"
>  
>  #define HUGETLBFS_MAGIC   0x958458f6
>  
> @@ -166,6 +167,8 @@ void *qemu_ram_mmap(int fd,
>  "crash.\n", file_name);
>  g_free(proc_link);
>  g_free(file_name);
> +warn_report("Using non DAX backing file with 'pmem=on' option"
> +" is deprecated");
>  }
>  /*
>   * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,

Re: [PATCH 1/2] qemu_capabilities: Introduce QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID

2021-01-14 Thread Igor Mammedov

On Tue, 12 Jan 2021 20:59:11 +0100
Peter Krempa  wrote:

> On Tue, Jan 12, 2021 at 20:24:44 +0100, Igor Mammedov wrote:
> > On Tue, 12 Jan 2021 18:41:38 +
> > Daniel P. Berrangé  wrote:
> >   
> > > On Tue, Jan 12, 2021 at 07:28:45PM +0100, Peter Krempa wrote:  
> > > > On Tue, Jan 12, 2021 at 19:20:58 +0100, Igor Mammedov wrote:
> > > > > On Tue, 12 Jan 2021 12:35:19 +0100
> > > > > Peter Krempa  wrote  
> 
> [...]
> 
> > > Yeah it is pretty dubious on the QEMU side to have used an "x-" prefix
> > > here at all, when use of this option is mandatory to make migration
> > > work :-(  
> > 
> > if generic consensus is to drop prefix, I can post a QEMU patch to do so
> > and let downstream(s) to carry burden.  
> 
> It really depends on the situation, because the commit messages don't
> seem to describe it satisfactory.
> 
> Basically we don't want to ever use a qemu property knob, which is qemu
> free to change arbitrarily.
> 
> If the property is to be used with any upcoming qemu version we must get
> a guarantee that it will not change. There are two options basically:
> 
> 1) 'x-' is dropped
>  1a) we will use it with qemu-6.0 and later
> ( everything is clean, but users will have to update qemu to fix it )
I have thought about it some more, (modulo downstream issue)
dropping prefix will effectively exclude old QEMU-(5.0-5.2)
even though feature is available there.

>  1b) we will carry code which will use 'x-' prefixed version from it's
>  inception until qemu-5.2, when we will hard-mask it out and add
>  plenty comments outlining that this is not what we do normally
>  (it will be okay for past releases, since they will not change)
5.2 is not enough, it should be carried till machine type 4.0 exists.
On QEMU side we once 4.0 machine type is removed we can deprecate and
remove no longer needed option so libvirt (with this patches) would see
that it no longer exists and not put it on CLI anymore. Only after that
it probably is ok to drop code for it.

> 
> 2) qemu declares the option stable with the 'x-' prefix
>We'll require that any place even in the code which declares the
>option has an appropriate comment preventing anybody from changing
>it.
> 
>We'll then add also cautionary comments discouraging use of it.
I've just resent v2 of QEMU patch that incorporates your suggestions.

> 
> 3) qemu fixes the issue without libvirt's involvment
if it were possible the option, I'd go for it in the first place.
unfortunately, it's too late for it now.
 
> For us really 1a) and 3 is acceptable without any comments. Other
> options will require extraordinary measures to prevent using this as
> prior art in using any other x-prefixed features from qemu.
> 
> in 1a) case, downstreams can obviously backport the qemu patch renaming
> the feature and libvirt will require no change at all
> 
> Now the question is whether we want to make migration work between the
> affected releases which will depend what to use.
If we can help it, then yes.
That's why I resent QEMU patch keeping 'x-' prefix (with your feedback 
included).

Re: [PATCH 1/2] qemu_capabilities: Introduce QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID

2021-01-12 Thread Igor Mammedov

On Tue, 12 Jan 2021 18:41:38 +
Daniel P. Berrangé  wrote:

> On Tue, Jan 12, 2021 at 07:28:45PM +0100, Peter Krempa wrote:
> > On Tue, Jan 12, 2021 at 19:20:58 +0100, Igor Mammedov wrote:  
> > > On Tue, 12 Jan 2021 12:35:19 +0100
> > > Peter Krempa  wrote:
> > >   
> > > > On Tue, Jan 12, 2021 at 12:29:58 +0100, Michal Privoznik wrote:  
> > > > > On 1/12/21 12:19 PM, Peter Krempa wrote:
> > > > > > On Tue, Jan 12, 2021 at 09:29:49 +0100, Michal Privoznik wrote:
> > > > > > > This capability tracks whether memory-backend-file has
> > > > > > > "x-use-canonical-path-for-ramblock-id" attribute. Introduced into
> > > > > > > QEMU by commit v4.0.0-rc0~189^2. While "x-" prefix is considered  
> > > > > > >   
> > > > > > 
> > > > > > Please use a commit hash instead of this.
> > > > > > 
> > > > > > > experimental or internal to QEMU, the next commit justifies its
> > > > > > > use.
> > > > > > 
> > > > > > NACK unless qemu adds a statement to their code and documentation 
> > > > > > that
> > > > > > the this property is considered stable despite the 'x-prefix' and 
> > > > > > you
> > > > > > add a link to the appropriate qemu upstream commit once it's done.
> > > > > > 
> > > > > > We don't want to depend on experimental stuff so we need a strong
> > > > > > excuse.
> > > > > > 
> > > > > 
> > > > > That's done in the next commit. Do you want me to copy it here too? I
> > > > > figured I'd put the justification where I'm actually setting the 
> > > > > internal
> > > > > knob.
> > > > 
> > > > Yes, because this is also mentioning the an 'x-' prefixed property. I
> > > > want to be absolutely clear in any places (including a comment in the
> > > > code, which you also should add into the capability code) that this is
> > > > extraordinary circumstance and that qemu is actually considering that
> > > > property stable.  
> > > 
> > > the only reason to keep x- prefix in this case is to cause less issues for
> > > downstream QEMUs. Since this compat property is copied to their own 
> > > machine types.
> > > If we keep prefix downstream doesn't have to do anything, if we rename it,
> > > then downstreams have to carry a separate patch that does the same for
> > > their old machine types.   
> > 
> > That would be okay if it's limited to past versions, but in this
> > instance it is not. Allowing x-prefixed properties for any future
> > release is a dangerous precedent. If we want to allow to detect the
> > capability also for future release, we must declare that it's for a very
> > particular reason and also that qemu will not delete it at will.
> > 
> > This is to prevent any future discussions of unwaranted usage of
> > x-prefixed properties in libvirt.  
> 
> Yeah it is pretty dubious on the QEMU side to have used an "x-" prefix
> here at all, when use of this option is mandatory to make migration
> work :-(

if generic consensus is to drop prefix, I can post a QEMU patch to do so
and let downstream(s) to carry burden.

> 
> Regards,
> Daniel

Re: [PATCH 2/2] qemu: Do not Use canonical path for system memory

2021-01-12 Thread Igor Mammedov

On Tue, 12 Jan 2021 09:29:50 +0100
Michal Privoznik  wrote:

> In commit v6.9.0-rc1~450 I've adapted libvirt to QEMU's deprecation of
> -mem-path and -mem-prealloc and switched to memory-backend-* even for
> system memory. My claim was that that's what QEMU does under the hood
> anyway. And indeed it was: see QEMU commit v5.0.0-rc0~75^2~1^2~76 and
> look at function create_default_memdev().
> 
> However, then commit v5.0.0-rc1~11^2~3 was merged into QEMU. While it
> was fixing a bug, it also changed the create_default_memdev() function
> in which it started turning off use of canonical path (by setting
> "x-use-canonical-path-for-ramblock-id" attribute to false). This wasn't
> documented until QEMU commit XXX. The path affects migration - the same
> path has to be used on the source and on the destination. Therefore, if
> there is old guest started with '-m X' it has "pc.ram" block which
> doesn't use canonical path and thus when migrating to newer QEMU which
> uses memory-backend-* we have to turn off the canonical path explicitly.
> Otherwise, "/objects/pc.ram" path would be expected by QEMU which
> doesn't match the source.
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1912201
> Signed-off-by: Michal Privoznik 
> ---
> 
> I'll replace both occurrences of 'QEMU commit XXX' once QEMU patch is
> merged.
> 
>  src/qemu/qemu_command.c   | 30 ---
>  src/qemu/qemu_command.h   |  3 +-
>  src/qemu/qemu_hotplug.c   |  2 +-
>  .../hugepages-memaccess3.x86_64-latest.args   |  4 +--
>  4 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index 6f970a3128..b99d4e5faf 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -2950,7 +2950,8 @@ qemuBuildMemoryBackendProps(virJSONValuePtr 
> *backendProps,
>  qemuDomainObjPrivatePtr priv,
>  const virDomainDef *def,
>  const virDomainMemoryDef *mem,
> -bool force)
> +bool force,
> +bool systemMemory)
>  {
>  const char *backendType = "memory-backend-file";
>  virDomainNumatuneMemMode mode;
> @@ -2967,6 +2968,7 @@ qemuBuildMemoryBackendProps(virJSONValuePtr 
> *backendProps,
>  bool needHugepage = !!pagesize;
>  bool useHugepage = !!pagesize;
>  int discard = mem->discard;
> +bool useCanonicalPath = true;
>  
>  /* The difference between @needHugepage and @useHugepage is that the 
> latter
>   * is true whenever huge page is defined for the current memory cell.
> @@ -3081,6 +3083,9 @@ qemuBuildMemoryBackendProps(virJSONValuePtr 
> *backendProps,
>  if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0)
>  return -1;
>  
> +if (systemMemory)
> +useCanonicalPath = false;
> +
>  } else if (useHugepage || mem->nvdimmPath || memAccess ||
>  def->mem.source == VIR_DOMAIN_MEMORY_SOURCE_FILE) {
>  
> @@ -3122,10 +3127,27 @@ qemuBuildMemoryBackendProps(virJSONValuePtr 
> *backendProps,
>  
>  if (qemuBuildMemoryBackendPropsShare(props, memAccess) < 0)
>  return -1;
> +
> +if (systemMemory)
> +useCanonicalPath = false;
> +
>  } else {
>  backendType = "memory-backend-ram";
>  }
>  
> +/* This is a terrible hack, but unfortunately there is no better way.
> + * The replacement for '-m X' argument is not simple '-machine
> + * memory-backend' and '-object memory-backend-*,size=X' (which was the
> + * idea). This is because of create_default_memdev() in QEMU sets
> + * 'x-use-canonical-path-for-ramblock-id' attribute to false and is
> + * documented in QEMU in qemu-options.hx under 'memory-backend'.
> + * See QEMU commit XXX.
> + */
> +if (!useCanonicalPath &&
> +virQEMUCapsGet(priv->qemuCaps, 
> QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID) &&
> +virJSONValueObjectAdd(props, 
> "b:x-use-canonical-path-for-ramblock-id", false, NULL) < 0)
> +return -1;

is it possible to do it only for old machine types <= 4.0, to limit hack 
exposure?


>  if (!priv->memPrealloc &&
>  virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0)
>  return -1;
> @@ -3237,7 +3259,7 @@ qemuBuildMemoryCellBackendStr(virDomainDefPtr def,
>  mem.info.alias = alias;
>  
>  if ((rc = qemuBuildMemoryBackendProps(&props, alias, cfg,
> -  priv, def, &mem, false)) < 0)
> +  priv, def, &mem, false, false)) < 
> 0)
>  return -1;
>  
>  if (virQEMUBuildObjectCommandlineFromJSON(buf, props) < 0)
> @@ -3266,7 +3288,7 @@ qemuBuildMemoryDimmBackendStr(virBufferPtr buf,
>  alias = g_strdup_printf("mem%s", mem->info.alias);
>  
>  if (qemuBuildMemoryBacken

Re: [PATCH 1/2] qemu_capabilities: Introduce QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID

2021-01-12 Thread Igor Mammedov

On Tue, 12 Jan 2021 12:35:19 +0100
Peter Krempa  wrote:

> On Tue, Jan 12, 2021 at 12:29:58 +0100, Michal Privoznik wrote:
> > On 1/12/21 12:19 PM, Peter Krempa wrote:  
> > > On Tue, Jan 12, 2021 at 09:29:49 +0100, Michal Privoznik wrote:  
> > > > This capability tracks whether memory-backend-file has
> > > > "x-use-canonical-path-for-ramblock-id" attribute. Introduced into
> > > > QEMU by commit v4.0.0-rc0~189^2. While "x-" prefix is considered  
> > > 
> > > Please use a commit hash instead of this.
> > >   
> > > > experimental or internal to QEMU, the next commit justifies its
> > > > use.  
> > > 
> > > NACK unless qemu adds a statement to their code and documentation that
> > > the this property is considered stable despite the 'x-prefix' and you
> > > add a link to the appropriate qemu upstream commit once it's done.
> > > 
> > > We don't want to depend on experimental stuff so we need a strong
> > > excuse.
> > >   
> > 
> > That's done in the next commit. Do you want me to copy it here too? I
> > figured I'd put the justification where I'm actually setting the internal
> > knob.  
> 
> Yes, because this is also mentioning the an 'x-' prefixed property. I
> want to be absolutely clear in any places (including a comment in the
> code, which you also should add into the capability code) that this is
> extraordinary circumstance and that qemu is actually considering that
> property stable.

the only reason to keep x- prefix in this case is to cause less issues for
downstream QEMUs. Since this compat property is copied to their own machine 
types.
If we keep prefix downstream doesn't have to do anything, if we rename it,
then downstreams have to carry a separate patch that does the same for
their old machine types. 

> I want to prevent that this commit will be used as an excuse to depend
> on experimental properties which are not actually considered
> non-experimental. 
>

[PATCH] Deprecate pmem=on with non-DAX capable backend file

2021-01-11 Thread Igor Mammedov

It is not safe to pretend that emulated NVDIMM supports
persistence while backend actually failed to enable it
and used non-persistent mapping as fall back.
Instead of falling-back, QEMU should be more strict and
error out with clear message that it's not supported.
So if user asks for persistence (pmem=on), they should
store backing file on NVDIMM.

Signed-off-by: Igor Mammedov 
Reviewed-by: Philippe Mathieu-Daudé 
---
v2:
  rephrase deprecation comment andwarning message
  (Philippe Mathieu-Daudé )
---
 docs/system/deprecated.rst | 17 +
 util/mmap-alloc.c  |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index bacd76d7a5..e79fb02b3a 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -327,6 +327,23 @@ The Raspberry Pi machines come in various models (A, A+, 
B, B+). To be able
 to distinguish which model QEMU is implementing, the ``raspi2`` and ``raspi3``
 machines have been renamed ``raspi2b`` and ``raspi3b``.
 
+Backend options
+---
+
+Using non-persistent backing file with pmem=on (since 6.0)
+''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+This option is used when ``memory-backend-file`` is consumed by emulated NVDIMM
+device. However enabling ``memory-backend-file.pmem`` option, when backing file
+is (a) not DAX capable or (b) not on a filesystem that support direct mapping
+of persistent memory, is not safe and may lead to data loss or corruption in 
case
+of host crash.
+Options are:
+- modify VM configuration to set ``pmem=off`` to continue using fake NVDIMM
+  (without persistence guaranties) with backing file on non DAX storage
+- move backing file to NVDIMM storage and keep ``pmem=on``
+  (to have NVDIMM with persistence guaranties).
+
 Device options
 --
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 27dcccd8ec..0388cc3be2 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "qemu/error-report.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -166,6 +167,8 @@ void *qemu_ram_mmap(int fd,
 "crash.\n", file_name);
 g_free(proc_link);
 g_free(file_name);
+warn_report("Using non DAX backing file with 'pmem=on' option"
+" is deprecated");
 }
 /*
  * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
-- 
2.27.0

Re: [PATCH] Deprecate pmem=on with non-DAX capable backend file

2020-12-29 Thread Igor Mammedov

On Tue, 29 Dec 2020 19:04:58 +0100
Philippe Mathieu-Daudé  wrote:

> On 12/29/20 6:29 PM, Igor Mammedov wrote:
> > It is not safe to pretend that emulated NVDIMM supports
> > persistence while backend actually failed to enable it
> > and used non-persistent mapping as fall back.
> > Instead of falling-back, QEMU should be more strict and
> > error out with clear message that it's not supported.
> > So if user asks for persistence (pmem=on), they should
> > store backing file on NVDIMM.
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> >  docs/system/deprecated.rst | 14 ++
> >  util/mmap-alloc.c  |  3 +++
> >  2 files changed, 17 insertions(+)
> > 
> > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> > index bacd76d7a5..ba4f6ed2fe 100644
> > --- a/docs/system/deprecated.rst
> > +++ b/docs/system/deprecated.rst
> > @@ -327,6 +327,20 @@ The Raspberry Pi machines come in various models (A, 
> > A+, B, B+). To be able
> >  to distinguish which model QEMU is implementing, the ``raspi2`` and 
> > ``raspi3``
> >  machines have been renamed ``raspi2b`` and ``raspi3b``.
> >  
> > +Backend options
> > +---
> > +
> > +Using non-persistent backing file with pmem=on (since 6.0)
> > +''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
> > +
> > +This option is used when ``memory-backend-file`` is consumed by emulated 
> > NVDIMM
> > +device. However enabling ``memory-backend-file.pmem`` option, when backing 
> > file
> > +is not DAX capable or not on a filesystem that support direct mapping of 
> > persistent  
> 
> Maybe clearer enumerating? As:
> "is a) not DAX capable or b) not on a filesystem that support direct
> mapping of persistent"

will change it to your variant in v2

> 
> > +memory, is not safe and may lead to data loss or corruption in case of 
> > host crash.
> > +Using pmem=on option with such file will return error, instead of a 
> > warning.  
> 
> Not sure the difference between warn/err is important in the doc.
not many care about warnings until QEMU starts fine,
I've mentioned error here so that whomever reading this would know what to 
expect

> 
> > +Options are to move backing file to NVDIMM storage or modify VM 
> > configuration
> > +to set ``pmem=off`` to continue using fake NVDIMM without persistence 
> > guaranties.  
> 
> Maybe:
> 
> The possibilities to continue using fake NVDIMM (without persistence
> guaranties) are:
> - move backing file to NVDIMM storage
> - modify VM configuration to set ``pmem=off``

only the later is faking nvdimm, the first is properly emulated one with 
persistence guaranties.
Maybe:
 Options are:
- modify VM configuration to set ``pmem=off`` to continue using fake NVDIMM
  (without persistence guaranties) on with backing file on non DAX storage
- move backing file to NVDIMM storage and keep ``pmem=on``,
  to have NVDIMM with persistence guaranties.

> > +
> >  Device options
> >  --
> >  
> > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> > index 27dcccd8ec..d226273a98 100644
> > --- a/util/mmap-alloc.c
> > +++ b/util/mmap-alloc.c
> > @@ -20,6 +20,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/mmap-alloc.h"
> >  #include "qemu/host-utils.h"
> > +#include "qemu/error-report.h"
> >  
> >  #define HUGETLBFS_MAGIC   0x958458f6
> >  
> > @@ -166,6 +167,8 @@ void *qemu_ram_mmap(int fd,
> >  "crash.\n", file_name);
> >  g_free(proc_link);
> >  g_free(file_name);
> > +warn_report("Deprecated using non DAX backing file with"
> > +" pmem=on option");  
> 
> Maybe "Using non DAX backing file with 'pmem=on' option is deprecated"?
ok

> 
> Beside the nitpicking comments,
> Reviewed-by: Philippe Mathieu-Daudé 
> 
> >  }
> >  /*
> >   * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
> >   
>

[PATCH] Deprecate pmem=on with non-DAX capable backend file

2020-12-29 Thread Igor Mammedov

It is not safe to pretend that emulated NVDIMM supports
persistence while backend actually failed to enable it
and used non-persistent mapping as fall back.
Instead of falling-back, QEMU should be more strict and
error out with clear message that it's not supported.
So if user asks for persistence (pmem=on), they should
store backing file on NVDIMM.

Signed-off-by: Igor Mammedov 
---
 docs/system/deprecated.rst | 14 ++
 util/mmap-alloc.c  |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index bacd76d7a5..ba4f6ed2fe 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -327,6 +327,20 @@ The Raspberry Pi machines come in various models (A, A+, 
B, B+). To be able
 to distinguish which model QEMU is implementing, the ``raspi2`` and ``raspi3``
 machines have been renamed ``raspi2b`` and ``raspi3b``.
 
+Backend options
+---
+
+Using non-persistent backing file with pmem=on (since 6.0)
+''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+This option is used when ``memory-backend-file`` is consumed by emulated NVDIMM
+device. However enabling ``memory-backend-file.pmem`` option, when backing file
+is not DAX capable or not on a filesystem that support direct mapping of 
persistent
+memory, is not safe and may lead to data loss or corruption in case of host 
crash.
+Using pmem=on option with such file will return error, instead of a warning.
+Options are to move backing file to NVDIMM storage or modify VM configuration
+to set ``pmem=off`` to continue using fake NVDIMM without persistence 
guaranties.
+
 Device options
 --
 
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 27dcccd8ec..d226273a98 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -20,6 +20,7 @@
 #include "qemu/osdep.h"
 #include "qemu/mmap-alloc.h"
 #include "qemu/host-utils.h"
+#include "qemu/error-report.h"
 
 #define HUGETLBFS_MAGIC   0x958458f6
 
@@ -166,6 +167,8 @@ void *qemu_ram_mmap(int fd,
 "crash.\n", file_name);
 g_free(proc_link);
 g_free(file_name);
+warn_report("Deprecated using non DAX backing file with"
+" pmem=on option");
 }
 /*
  * if map failed with MAP_SHARED_VALIDATE | MAP_SYNC,
-- 
2.27.0

[RFC 3/5] pci: introduce apci-index property for PCI device

2020-12-22 Thread Igor Mammedov

In x86/ACPI world, since systemd v197, linux distros are
using predictable network interface naming since systemd
v197. Which on QEMU based VMs results into path based
naming scheme, that names network interfaces based on PCI
topology.

With this one has to plug NIC in exacly the same bus/slot,
which was used when disk image was first provisioned/configured
or one risks to loose network configuration due to NIC being
renamed to actually used topology.
That also restricts freedom reshape PCI configuration of
VM without need to reconfigure used guest image.

systemd also offers "onboard" naming scheme which is
preffered over PCI slot/topology one, provided that
firmware implements:
"
PCI Firmware Specification 3.1
4.6.7.  DSM for Naming a PCI or PCI Express Device Under
Operating Systems
"
that allows to assign user defined index to PCI device,
which systemd will use to name NIC. For example, using
  -device e1000,acpi-index=100
guest will rename NIC to 'eno100', where 'eno' is default
prefix for "onboard" naming scheme. This doesn't reqiure
any advance configuration on guest side.

Hope is that 'acpi-index' will be easier to consume by
mangment layer, compared to forcing specic PCI topology
and/or having several disk image templates for different
topologies and will help to simplify process of spawning
VM from the same template without need to reconfigure
guest network configuration.

this patch adds, 'acpi-index'* property and wires up
(abuses) unused pci hotplug registers to pass index
value to AML code at runtime.
Following patch will add corresponding _DSM code and
wire it up to PCI devices described in ACPI.

*) name comes from linux kernel terminology

Signed-off-by: Igor Mammedov 
---
CC: libvir-list@redhat.com

 include/hw/acpi/pcihp.h |  7 ++-
 include/hw/pci/pci.h|  1 +
 hw/acpi/pci.c   |  6 ++
 hw/acpi/pcihp.c | 25 -
 hw/i386/acpi-build.c| 10 ++
 hw/pci/pci.c|  1 +
 6 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
index dfd375820f..72d1773ca1 100644
--- a/include/hw/acpi/pcihp.h
+++ b/include/hw/acpi/pcihp.h
@@ -46,6 +46,7 @@ typedef struct AcpiPciHpPciStatus {
 typedef struct AcpiPciHpState {
 AcpiPciHpPciStatus acpi_pcihp_pci_status[ACPI_PCIHP_MAX_HOTPLUG_BUS];
 uint32_t hotplug_select;
+uint32_t acpi_index;
 PCIBus *root;
 MemoryRegion io;
 bool legacy_piix;
@@ -71,6 +72,8 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool 
acpihp_root_off);
 
 extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
 
+bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id);
+
 #define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp) \
 VMSTATE_UINT32_TEST(pcihp.hotplug_select, state, \
 test_pcihp), \
@@ -78,6 +81,8 @@ extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
   ACPI_PCIHP_MAX_HOTPLUG_BUS, \
   test_pcihp, 1, \
   vmstate_acpi_pcihp_pci_status, \
-  AcpiPciHpPciStatus)
+  AcpiPciHpPciStatus), \
+VMSTATE_UINT32_TEST(pcihp.acpi_index, state, \
+vmstate_acpi_pcihp_use_acpi_index)
 
 #endif
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 259f9c992d..e592532558 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -357,6 +357,7 @@ struct PCIDevice {
 
 /* ID of standby device in net_failover pair */
 char *failover_pair_id;
+uint32_t acpi_index;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/hw/acpi/pci.c b/hw/acpi/pci.c
index 9510597a19..07d5101d83 100644
--- a/hw/acpi/pci.c
+++ b/hw/acpi/pci.c
@@ -27,6 +27,7 @@
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/pci.h"
 #include "hw/pci/pcie_host.h"
+#include "hw/acpi/pcihp.h"
 
 void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
 {
@@ -59,3 +60,8 @@ void build_mcfg(GArray *table_data, BIOSLinker *linker, 
AcpiMcfgInfo *info)
  "MCFG", table_data->len - mcfg_start, 1, NULL, NULL);
 }
 
+bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id)
+{
+ AcpiPciHpState *s = opaque;
+ return s->acpi_index;
+}
diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 9dc4d3e2db..9634567e3a 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -347,7 +347,8 @@ static uint64_t pci_read(void *opaque, hwaddr addr, 
unsigned int size)
 trace_acpi_pci_down_read(val);
 break;
 case PCI_EJ_BASE:
-/* No feature defined yet */
+val = s->acpi_index;
+s->acpi_index = 0;
 trace_acpi_pci_features_read(val);

Re: [PATCH] qemu: Relax memory pre-allocation rules

2020-12-14 Thread Igor Mammedov

On Mon, 30 Nov 2020 11:14:20 +
Daniel P. Berrangé  wrote:

> On Mon, Nov 30, 2020 at 11:48:28AM +0100, Michal Privoznik wrote:
> > On 11/30/20 11:16 AM, Daniel P. Berrangé wrote:  
> > > On Mon, Nov 30, 2020 at 11:06:14AM +0100, Michal Privoznik wrote:  
> > > > Currently, we configure QEMU to prealloc memory almost by
> > > > default. Well, by default for NVDIMMs, hugepages and if user
> > > > asked us to (via memoryBacking ).
> > > > 
> > > > However, there are two cases where this approach is not the best:
> > > > 
> > > > 1) in case when guest's NVDIMM is backed by real life NVDIMM. In
> > > > this case users should put  into the  device
> > > > , like this:
> > > > 
> > > > 
> > > >   
> > > > /dev/pmem0
> > > > 
> > > >   
> > > > 
> > > > 
> > > > Instructing QEMU to do prealloc in this case means that each
> > > > page of the NVDIMM is "touched" (the first byte is read and
> > > > written back - see QEMU commit v2.9.0-rc1~26^2) which cripples
> > > > device wear.
> > > > 
> > > > 2) if free-page-reporting is turned on. While the
> > > > free-page-reporting feature might not have a catchy or obvious
> > > > name, when enabled it instructs KVM and subsequently QEMU to
> > > > free pages no longer used by guest resulting in smaller memory
> > > > footprint. And preallocating whole memory goes against this.
> > > > 
> > > > The BZ comment 11 mentions another, third case 'virtio-mem' but
> > > > that is not implemented in libvirt, yet.
> > > > 
> > > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1894053
> > > > Signed-off-by: Michal Privoznik 
> > > > ---
> > > >   src/qemu/qemu_command.c   | 11 +--
> > > >   .../memory-hotplug-nvdimm-pmem.x86_64-latest.args |  2 +-
> > > >   2 files changed, 10 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> > > > index 479bcc0b0c..3df8b5ac76 100644
> > > > --- a/src/qemu/qemu_command.c
> > > > +++ b/src/qemu/qemu_command.c
> > > > @@ -2977,7 +2977,11 @@ qemuBuildMemoryBackendProps(virJSONValuePtr 
> > > > *backendProps,
> > > >   if (discard == VIR_TRISTATE_BOOL_ABSENT)
> > > >   discard = def->mem.discard;
> > > > -if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE)
> > > > +/* The whole point of free_page_reporting is that as soon as guest 
> > > > frees
> > > > + * any memory it is freed in the host too. Prealloc doesn't make 
> > > > much sense
> > > > + * then. */
> > > > +if (def->mem.allocation == VIR_DOMAIN_MEMORY_ALLOCATION_IMMEDIATE 
> > > > &&
> > > > +def->memballoon->free_page_reporting != VIR_TRISTATE_SWITCH_ON)
> > > >   prealloc = true;  
> > > 
> > > If the user asked for allocation == immediate, we should not be
> > > silently ignoring that request. Isn't the scenario described simply
> > > a wierd user configuration scenario and if they don't want that, then
> > > then they can set  instead.  
> > 
> > Okay.
> >   
> > >   
> > > >   if (virDomainNumatuneGetMode(def->numa, mem->targetNode, &mode) < 
> > > > 0 &&
> > > > @@ -3064,7 +3068,10 @@ qemuBuildMemoryBackendProps(virJSONValuePtr 
> > > > *backendProps,
> > > >   if (mem->nvdimmPath) {
> > > >   memPath = g_strdup(mem->nvdimmPath);
> > > > -prealloc = true;  
> > > 
> > > 
> > >   
> > > > +/* If the NVDIMM is a real device then there's nothing to 
> > > > prealloc.
> > > > + * If anyhing, we would be only wearing off the device. */
> > > > +if (!mem->nvdimmPmem)
> > > > +prealloc = true;  
> > > 
> > > I wonder if QEMU itself should take this optimization to skip its
> > > allocation logic ?  

by default QEMU does not prealloc, and if users explicitly ask for prealloc,
they should get it.
So libvirt also shouldn't set prealloc by default when it comes to nvdimm
on a file that's allocated on pmem enabled storage.

> > Also would make sense. This is that kind of bug which lies in between
> > libvirt and qemu. Although, since we are worried in silently ignoring user
> > requests, then wouldn't this be exactly what QEMU would be doing? I mean, if
> > an user/libvirt put both .prealloc=yes and .pmem=yes onto cmd line then
> > these would cancel out, wouldn't they?  
> 
> The difference is that an real NVDIMM is inherantly preallocated. QEMU

that's assuming that used backend file is on NVDIMM
(pmem=on doesn't guarantee it though)

> would not be ignoring the prealloc=yes arg - its implementation would
> merely be a no-op.

As for ignoring user's input, I don't like it (it usually bites down' the road).
if we decide that "pmem=on + prealloc=on" is invalid combo,
I'd rather error out with "fix your CLI" kind of message or we can warn user
that options combination is not optimal.



> Regards,
> Daniel

Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready

2020-10-23 Thread Igor Mammedov

On Fri, 23 Oct 2020 11:54:40 -0400
"Michael S. Tsirkin"  wrote:

> On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote:
> > Hi David,
> > 
> > On Fri, Oct 23, 2020 at 6:49 AM David Gibson  wrote:
> > 
> > On Thu, 22 Oct 2020 11:01:04 -0400
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> > >Ã‚Â  [...]Ã‚Â 
> > >
> > > Right. After detecting just failing unconditionally it a bit too
> > > simplistic IMHO.  
> > 
> > There's also another factor here, which I thought I'd mentioned
> > already, but looks like I didn't: I think we're still missing some
> > details in what's going on.
> > 
> > The premise for this patch is that plugging while the indicator is in
> > transition state is allowed to fail in any way on the guest side.Ã‚Â  I
> > don't think that's a reasonable interpretation, because it's unworkable
> > for physical hotplug.Ã‚Â  If the indicator starts blinking while you're 
> > in
> > the middle of shoving a card in, you'd be in trouble.
> > 
> > So, what I'm assuming here is that while "don't plug while blinking" is
> > the instruction for the operator to obey as best they can, on the guest
> > side the rule has to be "start blinking, wait a while and by the time
> > you leave blinking state again, you can be confident any plugs or
> > unplugs have completed".Ã‚Â  Obviously still racy in the strict computer
> > science sense, but about the best you can do with slow humans in the
> > mix.
> > 
> > So, qemu should of course endeavour to follow that rule as though it
> > was a human operator on a physical machine and not plug when the
> > indicator is blinking.Ã‚Â  *But* the qemu plug will in practice be fast
> > enough that if we're hitting real problems here, it suggests the guest
> > is still doing something wrong.
> > 
> > 
> > I personally think there is a little bit of over-engineeringÃ‚Â here.
> > Let's start with the spec:
> > 
> > Ã‚Â  Ã‚Â  Power Indicator Blinking
> > Ã‚Â  Ã‚Â  A blinking Power Indicator indicates that the slot is powering up 
> > or
> > powering down and that
> > Ã‚Â  Ã‚Â  insertion or removal of the adapter is not permitted.
> > 
> > What exactly is an interpretation here?
> > As you stated, the races are theoretical, the whole point of the indicator
> > is to let the operator know he can't plug the device just yet.
> > 
> > I understand it would be more user friendly if the QEMU would wait 
> > internally
> > for the
> > blinking to end, but the whole point of the indicator is to let the 
> > operatorÃ‚Â 
> > (human or machine)
> > know they can't plug the device at a specific time.
> > Should QEMU take the responsibilityÃ‚Â of the operator? Is it even correct?
> > 
> > Even if we would want such a feature, how is it related to this patch?
> > The patch simply refuses to start a hotplug operation when it knows it will 
> > not
> > succeed.Ã‚Â 
> > Ã‚Â 
> > Another way that would make sense to me would beÃ‚Â  is a new QEMU 
> > interface other
> > than
> > "add_device", let's say "adding_device_allowed", that would return true if 
> > the
> > hotplug is allowed
> > at this point of time. (I am aware of the theoretical races)Ã‚Â   
> 
> Rather than adding_device_allowed, something like "query slot"
> might be helpful for debugging. That would help user figure out
> e.g. why isn't device visible without any races.

Would be new command useful tough? What we end up is broken guest
(if I read commit message right) and a user who has no idea if 
device_add was successful or not.
So what user should do in this case
  - wait till it explodes?
  - can user remove it or it would be stuck there forever?
  - poll slot before hotplug, manually?

(if this is the case then failing device_add cleanly doesn't sound bad,
it looks similar to another error we have "/* Check if hot-plug is disabled on 
the slot */"
in pcie_cap_slot_pre_plug_cb)

CCing libvirt, as it concerns not only QEMU.

> 
> > The above will at least mimic the mechanics of the pyhsÃ‚Â world.Ã‚Â  The 
> > operator
> > looks at the indicator,
> > the management software checks if adding the device is allowed.
> > Since it is a corner case I would prefer the device_add to fail rather than
> > introducing a new interface,
> > but that's just me.
> > 
> > Thanks,
> > Marcel
> >   
> 
> I think we want QEMU management interface to be reasonably
> abstract and agnostic if possible. Pushing knowledge of hardware
> detail to management will just lead to pain IMHO.
> We supported device_add which practically never fails for years,

For CPUs and RAM, device_add can fail so maybe management is also
prepared to handle errors on PCI hotplug path.

> at this point it's easier to keep supporting it than
> change all users ...
> 
> 
> > 
> > --
> > David Gibson 
> > Principal Software Engineer, Virtualization, Red Hat
> >   
> 
>

Re: [PATCH REBASE 7/7] qemu: Use memory-backend-* for regular guest memory

2020-09-15 Thread Igor Mammedov

On Tue, 15 Sep 2020 10:59:04 +0100
Daniel P. Berrangé  wrote:

> On Tue, Sep 15, 2020 at 11:53:56AM +0200, Igor Mammedov wrote:
> > On Tue, 15 Sep 2020 10:54:46 +0200
> > Michal Privoznik  wrote:
> >   
> > > On 9/8/20 3:55 PM, Ján Tomko wrote:  
> > > > On a Tuesday in 2020, Michal Privoznik wrote:
> > >   
> > > >> diff --git 
> > > >> a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args
> > > >>  
> > > >> b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args
> > > >> index 5d256c42bc..b43e7d9c3c 100644
> > > >> --- 
> > > >> a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args
> > > >> +++ 
> > > >> b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args
> > > >> @@ -12,14 +12,16 @@ QEMU_AUDIO_DRV=none \
> > > >> -S \
> > > >> -object secret,id=masterKey0,format=raw,\
> > > >> file=/tmp/lib/domain--1-instance-0092/master-key.aes \
> > > >> --machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off \
> > > >> +-machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off,\
> > > >> +memory-backend=pc.ram \
> > > >> -cpu qemu64 \
> > > >> -m 14336 \
> > > >> --mem-prealloc \
> > > >> +-object 
> > > >> memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,\
> > > >> +share=yes,prealloc=yes,size=15032385536 \
> > > >> -overcommit mem-lock=off \
> > > >> -smp 8,sockets=1,dies=1,cores=8,threads=1 \
> > > >> -object 
> > > >> memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,\
> > > >> -share=yes,size=15032385536,host-nodes=3,policy=preferred \
> > > >> +share=yes,prealloc=yes,size=15032385536,host-nodes=3,policy=preferred 
> > > >> \
> > > >> -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
> > > >> -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \
> > > >> -display none \
> > > > 
> > > > Should we format all the fields twice in these cases?
> > > 
> > > Ah, good question. Honestly, I don't remember, it was slightly longer 
> > > ago that I've written these patches. Igor, do you perhaps remember 
> > > whether libvirt needs to specify both: -machine memory-backend=$id and 
> > > -object memory-backend-*,id=$id?  
> > 
> > the later defines backend and the former uses it,
> > short answer is yes.
> > 
> > you do not need 
> >  --mem-prealloc
> > if you explicitly set "prealloc=yes" on backend.
> > 
> > I'd prefer if libvirt stopped using old -mem-prealloc and -mem-path
> > in favor of explicit properties on backend, so QEMU could deprecate
> > it and drop aliasing code which uses global properties hack.  
> 
> IIRC, we tried todo that in the past and the change to use a backend
> impacted migration ABI compatibility. 

for new machine types, that shouldn't happen as they use memory-backend
internally (assuming CLI isn't messed up).

old machine types should cope with switch too, the only thing we were not
able to convert "-numa node,mem" => "-numa memdev" due to odd sizes 'mem' 
allowed,
that's why "-numa node,mem" were preserved for old machine types.


> 
> 
> Regards,
> Daniel

[PATCH v3] cphp: remove deprecated cpu-add command(s)

2020-09-15 Thread Igor Mammedov

These were deprecated since 4.0, remove both HMP and QMP variants.

Users should use device_add command instead. To get list of
possible CPUs and options, use 'info hotpluggable-cpus' HMP
or query-hotpluggable-cpus QMP command.

Signed-off-by: Igor Mammedov 
Reviewed-by: Thomas Huth 
Acked-by: Dr. David Alan Gilbert 
Reviewed-by: Michal Privoznik 
Acked-by: Cornelia Huck 
---

v2,3: fix typos in commit message

 include/hw/boards.h |   1 -
 include/hw/i386/pc.h|   1 -
 include/monitor/hmp.h   |   1 -
 docs/system/deprecated.rst  |  25 +
 hmp-commands.hx |  15 --
 hw/core/machine-hmp-cmds.c  |  12 -
 hw/core/machine-qmp-cmds.c  |  12 -
 hw/i386/pc.c|  27 --
 hw/i386/pc_piix.c   |   1 -
 hw/s390x/s390-virtio-ccw.c  |  12 -
 qapi/machine.json   |  24 -
 tests/qtest/cpu-plug-test.c | 100 
 tests/qtest/test-hmp.c  |   1 -
 13 files changed, 21 insertions(+), 211 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 795910d01b..7abd5d889c 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -169,7 +169,6 @@ struct MachineClass {
 void (*init)(MachineState *state);
 void (*reset)(MachineState *state);
 void (*wakeup)(MachineState *state);
-void (*hot_add_cpu)(MachineState *state, const int64_t id, Error **errp);
 int (*kvm_type)(MachineState *machine, const char *arg);
 void (*smp_parse)(MachineState *ms, QemuOpts *opts);
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 421a77acc2..79b7ab17bc 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -135,7 +135,6 @@ extern int fd_bootchk;
 
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 
-void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
 void pc_smp_parse(MachineState *ms, QemuOpts *opts);
 
 void pc_guest_info_init(PCMachineState *pcms);
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index c986cfd28b..642e9e91f9 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -89,7 +89,6 @@ void hmp_chardev_add(Monitor *mon, const QDict *qdict);
 void hmp_chardev_change(Monitor *mon, const QDict *qdict);
 void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
 void hmp_chardev_send_break(Monitor *mon, const QDict *qdict);
-void hmp_cpu_add(Monitor *mon, const QDict *qdict);
 void hmp_object_add(Monitor *mon, const QDict *qdict);
 void hmp_object_del(Monitor *mon, const QDict *qdict);
 void hmp_info_memdev(Monitor *mon, const QDict *qdict);
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index a158e765c3..c43c53f432 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -284,13 +284,6 @@ The ``query-cpus`` command is replaced by the 
``query-cpus-fast`` command.
 The ``arch`` output member of the ``query-cpus-fast`` command is
 replaced by the ``target`` output member.
 
-``cpu-add`` (since 4.0)
-'''''''''''''''''''''''
-
-Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``.  See
-documentation of ``query-hotpluggable-cpus`` for additional
-details.
-
 ``query-events`` (since 4.0)
 ''''''''''''''''''''''''''''
 
@@ -306,12 +299,6 @@ the 'wait' field, which is only applicable to sockets in 
server mode
 Human Monitor Protocol (HMP) commands
 -
 
-``cpu-add`` (since 4.0)
-'''''''''''''''''''''''
-
-Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``.  See
-documentation of ``query-hotpluggable-cpus`` for additional details.
-
 ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` 
(since 4.0.0)
 
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
@@ -521,6 +508,12 @@ QEMU Machine Protocol (QMP) commands
 The "autoload" parameter has been ignored since 2.12.0. All bitmaps
 are automatically loaded from qcow2 images.
 
+``cpu-add`` (removed in 5.2)
+''''''''''''''''''''''''''''

Re: [PATCH REBASE 7/7] qemu: Use memory-backend-* for regular guest memory

2020-09-15 Thread Igor Mammedov

On Tue, 15 Sep 2020 10:54:46 +0200
Michal Privoznik  wrote:

> On 9/8/20 3:55 PM, Ján Tomko wrote:
> > On a Tuesday in 2020, Michal Privoznik wrote:  
> 
> >> diff --git 
> >> a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args 
> >> b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args
> >> index 5d256c42bc..b43e7d9c3c 100644
> >> --- 
> >> a/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args
> >> +++ 
> >> b/tests/qemuxml2argvdata/memfd-memory-default-hugepage.x86_64-latest.args
> >> @@ -12,14 +12,16 @@ QEMU_AUDIO_DRV=none \
> >> -S \
> >> -object secret,id=masterKey0,format=raw,\
> >> file=/tmp/lib/domain--1-instance-0092/master-key.aes \
> >> --machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off \
> >> +-machine pc-i440fx-2.3,accel=kvm,usb=off,dump-guest-core=off,\
> >> +memory-backend=pc.ram \
> >> -cpu qemu64 \
> >> -m 14336 \
> >> --mem-prealloc \
> >> +-object memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,\
> >> +share=yes,prealloc=yes,size=15032385536 \
> >> -overcommit mem-lock=off \
> >> -smp 8,sockets=1,dies=1,cores=8,threads=1 \
> >> -object 
> >> memory-backend-memfd,id=ram-node0,hugetlb=yes,hugetlbsize=2097152,\
> >> -share=yes,size=15032385536,host-nodes=3,policy=preferred \
> >> +share=yes,prealloc=yes,size=15032385536,host-nodes=3,policy=preferred \
> >> -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
> >> -uuid 126f2720-6f8e-45ab-a886-ec9277079a67 \
> >> -display none \  
> > 
> > Should we format all the fields twice in these cases?  
> 
> Ah, good question. Honestly, I don't remember, it was slightly longer 
> ago that I've written these patches. Igor, do you perhaps remember 
> whether libvirt needs to specify both: -machine memory-backend=$id and 
> -object memory-backend-*,id=$id?

the later defines backend and the former uses it,
short answer is yes.

you do not need 
 --mem-prealloc
if you explicitly set "prealloc=yes" on backend.

I'd prefer if libvirt stopped using old -mem-prealloc and -mem-path
in favor of explicit properties on backend, so QEMU could deprecate
it and drop aliasing code which uses global properties hack.

also if ' -machine memory-backend=' is used and '-m' sets only
initial ram size, then '-m' can be omitted as size will be derived
from used backend.

> 
> Michal

Re: [PATCH v2] cphp: remove deprecated cpu-add command(s)

2020-09-14 Thread Igor Mammedov

On Mon, 14 Sep 2020 10:07:36 +0200
Michal Privoznik  wrote:

> On 9/14/20 9:46 AM, Igor Mammedov wrote:
> > theses were deprecated since 4.0, remove both HMP and QMP variants.
> > 
> > Users should use device_add command instead. To get list of
> > possible CPUs and options, use 'info hotpluggable-cpus' HMP
> > or query-hotpluggable-cpus QMP command.
> > 
> > Signed-off-by: Igor Mammedov 
> > Reviewed-by: Thomas Huth 
> > Acked-by: Dr. David Alan Gilbert 
> > ---
> >   include/hw/boards.h |   1 -
> >   include/hw/i386/pc.h|   1 -
> >   include/monitor/hmp.h   |   1 -
> >   docs/system/deprecated.rst  |  25 +
> >   hmp-commands.hx |  15 --
> >   hw/core/machine-hmp-cmds.c  |  12 -
> >   hw/core/machine-qmp-cmds.c  |  12 -
> >   hw/i386/pc.c|  27 --
> >   hw/i386/pc_piix.c   |   1 -
> >   hw/s390x/s390-virtio-ccw.c  |  12 -
> >   qapi/machine.json   |  24 -
> >   tests/qtest/cpu-plug-test.c | 100 
> >   tests/qtest/test-hmp.c  |   1 -
> >   13 files changed, 21 insertions(+), 211 deletions(-)  
> 
> Thanks to Peter Libvirt uses device_add instead cpu_add whenever 
> possible. Hence this is okay from Libvirt's POV.
we shoul make libvirt switch from -numa node,cpus= to -numa cpu=
to get rid of the 'last' interface that uses cpu-index as input.

To help libvirt to migrate existing configs from older syntax to
the newer one, we can introduce field x-cpu-index to
query-hotplugable-cpus output (with a goal to deprecate it in few years).
Would it work for you?

> 
> Reviewed-by: Michal Privoznik 
Thanks!

> 
> Michal
>

[PATCH v2] cphp: remove deprecated cpu-add command(s)

2020-09-14 Thread Igor Mammedov

theses were deprecated since 4.0, remove both HMP and QMP variants.

Users should use device_add command instead. To get list of
possible CPUs and options, use 'info hotpluggable-cpus' HMP
or query-hotpluggable-cpus QMP command.

Signed-off-by: Igor Mammedov 
Reviewed-by: Thomas Huth 
Acked-by: Dr. David Alan Gilbert 
---
 include/hw/boards.h |   1 -
 include/hw/i386/pc.h|   1 -
 include/monitor/hmp.h   |   1 -
 docs/system/deprecated.rst  |  25 +
 hmp-commands.hx |  15 --
 hw/core/machine-hmp-cmds.c  |  12 -
 hw/core/machine-qmp-cmds.c  |  12 -
 hw/i386/pc.c|  27 --
 hw/i386/pc_piix.c   |   1 -
 hw/s390x/s390-virtio-ccw.c  |  12 -
 qapi/machine.json   |  24 -
 tests/qtest/cpu-plug-test.c | 100 
 tests/qtest/test-hmp.c  |   1 -
 13 files changed, 21 insertions(+), 211 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 795910d01b..7abd5d889c 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -169,7 +169,6 @@ struct MachineClass {
 void (*init)(MachineState *state);
 void (*reset)(MachineState *state);
 void (*wakeup)(MachineState *state);
-void (*hot_add_cpu)(MachineState *state, const int64_t id, Error **errp);
 int (*kvm_type)(MachineState *machine, const char *arg);
 void (*smp_parse)(MachineState *ms, QemuOpts *opts);
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 421a77acc2..79b7ab17bc 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -135,7 +135,6 @@ extern int fd_bootchk;
 
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 
-void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
 void pc_smp_parse(MachineState *ms, QemuOpts *opts);
 
 void pc_guest_info_init(PCMachineState *pcms);
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index c986cfd28b..642e9e91f9 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -89,7 +89,6 @@ void hmp_chardev_add(Monitor *mon, const QDict *qdict);
 void hmp_chardev_change(Monitor *mon, const QDict *qdict);
 void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
 void hmp_chardev_send_break(Monitor *mon, const QDict *qdict);
-void hmp_cpu_add(Monitor *mon, const QDict *qdict);
 void hmp_object_add(Monitor *mon, const QDict *qdict);
 void hmp_object_del(Monitor *mon, const QDict *qdict);
 void hmp_info_memdev(Monitor *mon, const QDict *qdict);
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index a158e765c3..c43c53f432 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -284,13 +284,6 @@ The ``query-cpus`` command is replaced by the 
``query-cpus-fast`` command.
 The ``arch`` output member of the ``query-cpus-fast`` command is
 replaced by the ``target`` output member.
 
-``cpu-add`` (since 4.0)
-'''''''''''''''''''''''
-
-Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``.  See
-documentation of ``query-hotpluggable-cpus`` for additional
-details.
-
 ``query-events`` (since 4.0)
 ''''''''''''''''''''''''''''
 
@@ -306,12 +299,6 @@ the 'wait' field, which is only applicable to sockets in 
server mode
 Human Monitor Protocol (HMP) commands
 -
 
-``cpu-add`` (since 4.0)
-'''''''''''''''''''''''
-
-Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``.  See
-documentation of ``query-hotpluggable-cpus`` for additional details.
-
 ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` 
(since 4.0.0)
 
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
@@ -521,6 +508,12 @@ QEMU Machine Protocol (QMP) commands
 The "autoload" parameter has been ignored since 2.12.0. All bitmaps
 are automatically loaded from qcow2 images.
 
+``cpu-add`` (removed in 5.2)
+''''''''''''''''''''''''''''
+
+Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``.  See
+documentation of `

Re: [PATCH] smp: drop support for deprecated (invalid topologies)

2020-09-14 Thread Igor Mammedov

On Fri, 11 Sep 2020 11:04:47 -0400
"Michael S. Tsirkin"  wrote:

> On Fri, Sep 11, 2020 at 09:32:02AM -0400, Igor Mammedov wrote:
> > it's was deprecated since 3.1
> > 
> > Support for invalid topologies is removed, the user must ensure
> > that topologies described with -smp include all possible cpus,
> > i.e. (sockets * cores * threads) == maxcpus or QEMU will
> > exit with error.
> > 
> > Signed-off-by: Igor Mammedov   
> 
> Acked-by: 
> 
> memory tree I guess?

It would be better for Paolo to take it since he has
queued numa deprecations, due to context confilict in
deprecated.rst.

Paolo,
can you queue this patch as well?

> 
> > ---
> >  docs/system/deprecated.rst | 26 +-
> >  hw/core/machine.c  | 16 
> >  hw/i386/pc.c   | 16 
> >  3 files changed, 21 insertions(+), 37 deletions(-)
> > 
> > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> > index 122717cfee..d737728fab 100644
> > --- a/docs/system/deprecated.rst
> > +++ b/docs/system/deprecated.rst
> > @@ -47,19 +47,6 @@ The 'file' driver for drives is no longer appropriate 
> > for character or host
> >  devices and will only accept regular files (S_IFREG). The correct driver
> >  for these file types is 'host_cdrom' or 'host_device' as appropriate.
> >  
> > -``-smp`` (invalid topologies) (since 3.1)
> > -'''''''''''''''''''''''''''''''''''''''''
> > -
> > -CPU topology properties should describe whole machine topology including
> > -possible CPUs.
> > -
> > -However, historically it was possible to start QEMU with an incorrect 
> > topology
> > -where *n* <= *sockets* * *cores* * *threads* < *maxcpus*,
> > -which could lead to an incorrect topology enumeration by the guest.
> > -Support for invalid topologies will be removed, the user must ensure
> > -topologies described with -smp include all possible cpus, i.e.
> > -*sockets* * *cores* * *threads* = *maxcpus*.
> > -
> >  ``-vnc acl`` (since 4.0.0)
> >  ''''''''''''''''''''''''''
> >  
> > @@ -618,6 +605,19 @@ New machine versions (since 5.1) will not accept the 
> > option but it will still
> >  work with old machine types. User can check the QAPI schema to see if the 
> > legacy
> >  option is supported by looking at MachineInfo::numa-mem-supported property.
> >  
> > +``-smp`` (invalid topologies) (removed 5.2)
> > +'''''''''''''''''''''''''''''''''''''''''''
> > +
> > +CPU topology properties should describe whole machine topology including
> > +possible CPUs.
> > +
> > +However, historically it was possible to start QEMU with an incorrect 
> > topology
> > +where *n* <= *sockets* * *cores* * *threads* < *maxcpus*,
> > +which could lead to an incorrect topology enumeration by the guest.
> > +Support for invalid topologies is removed, the user must ensure
> > +topologies described with -smp include all possible cpus, i.e.
> > +*sockets* * *cores* * *threads* = *maxcpus*.
> > +
> >  Block devices
> >  -
> >  
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index ea26d61237..09aee4ea52 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -754,23 +754,15 @@ static void smp_parse(MachineState *ms, QemuOpts 
> > *opts)
> >  exit(1);
> >  }
> >  
> > -if (sockets * cores * threads > ms->smp.max_cpus) {
> > -error_report("cpu topology: "
> > - "sockets (%u) * cores (%u) * threads (%u) > "
> > - "maxcpus (%u)",
> > +if (sockets * cores * threads != ms->smp.max_cpus) {
> > +error_report("Invalid CPU topology: "
> > + "sockets (%u) * cores (%u) * threads (%u) "
> > + "!= maxcpus (%u)",
> >   sockets, cores, threads,
> >

[PATCH] smp: drop support for deprecated (invalid topologies)

2020-09-11 Thread Igor Mammedov

it's was deprecated since 3.1

Support for invalid topologies is removed, the user must ensure
that topologies described with -smp include all possible cpus,
i.e. (sockets * cores * threads) == maxcpus or QEMU will
exit with error.

Signed-off-by: Igor Mammedov 
---
 docs/system/deprecated.rst | 26 +-
 hw/core/machine.c  | 16 
 hw/i386/pc.c   | 16 
 3 files changed, 21 insertions(+), 37 deletions(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 122717cfee..d737728fab 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -47,19 +47,6 @@ The 'file' driver for drives is no longer appropriate for 
character or host
 devices and will only accept regular files (S_IFREG). The correct driver
 for these file types is 'host_cdrom' or 'host_device' as appropriate.
 
-``-smp`` (invalid topologies) (since 3.1)
-'''''''''''''''''''''''''''''''''''''''''
-
-CPU topology properties should describe whole machine topology including
-possible CPUs.
-
-However, historically it was possible to start QEMU with an incorrect topology
-where *n* <= *sockets* * *cores* * *threads* < *maxcpus*,
-which could lead to an incorrect topology enumeration by the guest.
-Support for invalid topologies will be removed, the user must ensure
-topologies described with -smp include all possible cpus, i.e.
-*sockets* * *cores* * *threads* = *maxcpus*.
-
 ``-vnc acl`` (since 4.0.0)
 ''''''''''''''''''''''''''
 
@@ -618,6 +605,19 @@ New machine versions (since 5.1) will not accept the 
option but it will still
 work with old machine types. User can check the QAPI schema to see if the 
legacy
 option is supported by looking at MachineInfo::numa-mem-supported property.
 
+``-smp`` (invalid topologies) (removed 5.2)
+'''''''''''''''''''''''''''''''''''''''''''
+
+CPU topology properties should describe whole machine topology including
+possible CPUs.
+
+However, historically it was possible to start QEMU with an incorrect topology
+where *n* <= *sockets* * *cores* * *threads* < *maxcpus*,
+which could lead to an incorrect topology enumeration by the guest.
+Support for invalid topologies is removed, the user must ensure
+topologies described with -smp include all possible cpus, i.e.
+*sockets* * *cores* * *threads* = *maxcpus*.
+
 Block devices
 -
 
diff --git a/hw/core/machine.c b/hw/core/machine.c
index ea26d61237..09aee4ea52 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -754,23 +754,15 @@ static void smp_parse(MachineState *ms, QemuOpts *opts)
 exit(1);
 }
 
-if (sockets * cores * threads > ms->smp.max_cpus) {
-error_report("cpu topology: "
- "sockets (%u) * cores (%u) * threads (%u) > "
- "maxcpus (%u)",
+if (sockets * cores * threads != ms->smp.max_cpus) {
+error_report("Invalid CPU topology: "
+ "sockets (%u) * cores (%u) * threads (%u) "
+ "!= maxcpus (%u)",
  sockets, cores, threads,
  ms->smp.max_cpus);
 exit(1);
 }
 
-if (sockets * cores * threads != ms->smp.max_cpus) {
-warn_report("Invalid CPU topology deprecated: "
-"sockets (%u) * cores (%u) * threads (%u) "
-"!= maxcpus (%u)",
-sockets, cores, threads,
-ms->smp.max_cpus);
-}
-
 ms->smp.cpus = cpus;
 ms->smp.cores = cores;
 ms->smp.threads = threads;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d071da787b..fbde6b04e6 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -746,23 +746,15 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts)
 exit(1);
 }
 
-if (sockets * dies * cores * threads > ms->smp.max_cpus) {
-error_report("cpu topology: "
- "sockets (%u) * dies (%u) * cores (%u) * threads (%u) 
> "
- "maxcpus (%u)",
+if (sockets * dies * cores * threads != ms->smp.max_cpus) {
+error_report("Invalid CPU topology deprecated: "
+ "sockets (%u) * dies (%u) * cores (%u) * threads (%u) 
"
+ "!= maxcpus (%u)",
  sockets, dies, cores, threads,
  ms->smp.max_cpus);
 exit(1);
 }
 
-if (sockets * dies * cores * threads != ms->smp.max_cpus) {
-warn_report("Invalid CPU topology deprecated: "
-"sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
-"!= maxcpus (%u)",
-sockets, dies, cores, threads,
-ms->smp.max_cpus);
-}
-
 ms->smp.cpus = cpus;
 ms->smp.cores = cores;
 ms->smp.threads = threads;
-- 
2.27.0

[PATCH] cphp: remove deprecated cpu-add command(s)

2020-09-11 Thread Igor Mammedov

theses were deprecatedince since 4.0, remove both HMP and QMP variants.

Users should use device_add commnad instead. To get list of
possible CPUs and options, use 'info hotpluggable-cpus' HMP
or query-hotpluggable-cpus QMP command.

Signed-off-by: Igor Mammedov 
---
 include/hw/boards.h |   1 -
 include/hw/i386/pc.h|   1 -
 include/monitor/hmp.h   |   1 -
 docs/system/deprecated.rst  |  25 +
 hmp-commands.hx |  15 --
 hw/core/machine-hmp-cmds.c  |  12 -
 hw/core/machine-qmp-cmds.c  |  12 -
 hw/i386/pc.c|  27 --
 hw/i386/pc_piix.c   |   1 -
 hw/s390x/s390-virtio-ccw.c  |  12 -
 qapi/machine.json   |  24 -
 tests/qtest/cpu-plug-test.c | 100 
 tests/qtest/test-hmp.c  |   1 -
 13 files changed, 21 insertions(+), 211 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index bc5b82ad20..2163843bdb 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -173,7 +173,6 @@ struct MachineClass {
 void (*init)(MachineState *state);
 void (*reset)(MachineState *state);
 void (*wakeup)(MachineState *state);
-void (*hot_add_cpu)(MachineState *state, const int64_t id, Error **errp);
 int (*kvm_type)(MachineState *machine, const char *arg);
 void (*smp_parse)(MachineState *ms, QemuOpts *opts);
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index fe52e165b2..ca8ff6cd27 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -137,7 +137,6 @@ extern int fd_bootchk;
 
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 
-void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
 void pc_smp_parse(MachineState *ms, QemuOpts *opts);
 
 void pc_guest_info_init(PCMachineState *pcms);
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index c986cfd28b..642e9e91f9 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -89,7 +89,6 @@ void hmp_chardev_add(Monitor *mon, const QDict *qdict);
 void hmp_chardev_change(Monitor *mon, const QDict *qdict);
 void hmp_chardev_remove(Monitor *mon, const QDict *qdict);
 void hmp_chardev_send_break(Monitor *mon, const QDict *qdict);
-void hmp_cpu_add(Monitor *mon, const QDict *qdict);
 void hmp_object_add(Monitor *mon, const QDict *qdict);
 void hmp_object_del(Monitor *mon, const QDict *qdict);
 void hmp_info_memdev(Monitor *mon, const QDict *qdict);
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 851dbdeb8a..122717cfee 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -284,13 +284,6 @@ The ``query-cpus`` command is replaced by the 
``query-cpus-fast`` command.
 The ``arch`` output member of the ``query-cpus-fast`` command is
 replaced by the ``target`` output member.
 
-``cpu-add`` (since 4.0)
-'''''''''''''''''''''''
-
-Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``.  See
-documentation of ``query-hotpluggable-cpus`` for additional
-details.
-
 ``query-events`` (since 4.0)
 ''''''''''''''''''''''''''''
 
@@ -306,12 +299,6 @@ the 'wait' field, which is only applicable to sockets in 
server mode
 Human Monitor Protocol (HMP) commands
 -
 
-``cpu-add`` (since 4.0)
-'''''''''''''''''''''''
-
-Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``.  See
-documentation of ``query-hotpluggable-cpus`` for additional details.
-
 ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` 
(since 4.0.0)
 
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
@@ -514,6 +501,12 @@ QEMU Machine Protocol (QMP) commands
 The "autoload" parameter has been ignored since 2.12.0. All bitmaps
 are automatically loaded from qcow2 images.
 
+``cpu-add`` (removed in 5.2)
+''''''''''''''''''''''''''''
+
+Use ``device_add`` for hotplugging vCPUs instead of ``cpu-add``.  See
+documentation of ``query-hotpluggable-cpus`` for additional details.
+

[PATCH 2/3] doc: Cleanup "'-mem-path' fallback to RAM" deprecation text

2020-09-11 Thread Igor Mammedov

it was actually removed in 5.0,
commit 68a86dc15c (numa: remove deprecated -mem-path fallback to anonymous RAM)
clean up forgotten remnants in docs.

Signed-off-by: Igor Mammedov 
---
 docs/system/deprecated.rst | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 6f9441005a..f252c92901 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -104,17 +104,6 @@ error in the future.
 The ``-realtime mlock=on|off`` argument has been replaced by the
 ``-overcommit mem-lock=on|off`` argument.
 
-``-mem-path`` fallback to RAM (since 4.1)
-'''''''''''''''''''''''''''''''''''''''''
-
-Currently if guest RAM allocation from file pointed by ``mem-path``
-fails, QEMU falls back to allocating from RAM, which might result
-in unpredictable behavior since the backing file specified by the user
-is ignored. In the future, users will be responsible for making sure
-the backing storage specified with ``-mem-path`` can actually provide
-the guest RAM configured with ``-m`` and QEMU will fail to start up if
-RAM allocation is unsuccessful.
-
 RISC-V ``-bios`` (since 5.1)
 ''''''''''''''''''''''''''''
 
@@ -624,6 +613,16 @@ New machine versions (since 5.1) will not accept the 
option but it will still
 work with old machine types. User can check the QAPI schema to see if the 
legacy
 option is supported by looking at MachineInfo::numa-mem-supported property.
 
+``-mem-path`` fallback to RAM (remove 5.0)
+''''''''''''''''''''''''''''''''''''''''''
+
+If guest RAM allocation from file pointed by ``mem-path`` failed,
+QEMU was falling back to allocating from RAM, which might have resulted
+in unpredictable behavior since the backing file specified by the user
+as ignored. Currently, users are responsible for making sure the backing 
storage
+specified with ``-mem-path`` can actually provide the guest RAM configured with
+``-m`` and QEMU fails to start up if RAM allocation is unsuccessful.
+
 Block devices
 -
 
-- 
2.27.0

[PATCH 3/3] numa: remove fixup numa_state->num_nodes to MAX_NODES

2020-09-11 Thread Igor Mammedov

current code permits only nodeids in [0..MAX_NODES) range
due to nodeid check in

  parse_numa_node()
  if (nodenr >= MAX_NODES) {
  error_setg(errp, "Max number of NUMA nodes reached: %"

so subj fixup is not reachable, drop it.

Signed-off-by: Igor Mammedov 
---
 hw/core/numa.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 706c1e84c6..7d5d413001 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -677,10 +677,6 @@ void numa_complete_configuration(MachineState *ms)
 if (ms->numa_state->num_nodes > 0) {
 uint64_t numa_total;
 
-if (ms->numa_state->num_nodes > MAX_NODES) {
-ms->numa_state->num_nodes = MAX_NODES;
-}
-
 numa_total = 0;
 for (i = 0; i < ms->numa_state->num_nodes; i++) {
 numa_total += numa_info[i].node_mem;
-- 
2.27.0

[PATCH 1/3] numa: drop support for '-numa node' (without memory specified)

2020-09-11 Thread Igor Mammedov

it was deprecated since 4.1
commit 4bb4a2732e (numa: deprecate implict memory distribution between nodes)

Users of existing VMs, wishing to preserve the same RAM distribution,
should configure it explicitly using ``-numa node,memdev`` options.
Current RAM distribution can be retrieved using HMP command
`info numa` and if separate memory devices (pc|nv-dimm) are present
use `info memory-device` and subtract device memory from output of
`info numa`.

Signed-off-by: Igor Mammedov 
---

 include/hw/boards.h|  2 --
 include/sysemu/numa.h  |  4 ---
 docs/system/deprecated.rst | 23 +---
 hw/core/machine.c  |  1 -
 hw/core/numa.c | 55 --
 hw/i386/pc_piix.c  |  1 -
 hw/i386/pc_q35.c   |  1 -
 hw/ppc/spapr.c |  1 -
 8 files changed, 14 insertions(+), 74 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index bc5b82ad20..15fc1a2bac 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -208,8 +208,6 @@ struct MachineClass {
 strList *allowed_dynamic_sysbus_devices;
 bool auto_enable_numa_with_memhp;
 bool auto_enable_numa_with_memdev;
-void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
- int nb_nodes, ram_addr_t size);
 bool ignore_boot_device_suffixes;
 bool smbus_no_migration_support;
 bool nvdimm_supported;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index ad58ee88f7..4173ef2afa 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -106,10 +106,6 @@ void parse_numa_hmat_cache(MachineState *ms, 
NumaHmatCacheOptions *node,
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
-void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
- int nb_nodes, ram_addr_t size);
-void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
-  int nb_nodes, ram_addr_t size);
 void numa_cpu_pre_plug(const struct CPUArchId *slot, DeviceState *dev,
Error **errp);
 bool numa_uses_legacy_mem(void);
diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 851dbdeb8a..6f9441005a 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -104,15 +104,6 @@ error in the future.
 The ``-realtime mlock=on|off`` argument has been replaced by the
 ``-overcommit mem-lock=on|off`` argument.
 
-``-numa`` node (without memory specified) (since 4.1)
-'''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Splitting RAM by default between NUMA nodes has the same issues as ``mem``
-parameter described above with the difference that the role of the user plays
-QEMU using implicit generic or board specific splitting rule.
-Use ``memdev`` with *memory-backend-ram* backend or ``mem`` (if
-it's supported by used machine type) to define mapping explictly instead.
-
 ``-mem-path`` fallback to RAM (since 4.1)
 '''''''''''''''''''''''''''''''''''''''''
 
@@ -602,6 +593,20 @@ error when ``-u`` is not used.
 Command line options
 
 
+``-numa`` node (without memory specified) (removed 5.2)
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Splitting RAM by default between NUMA nodes had the same issues as ``mem``
+parameter with the difference that the role of the user plays QEMU using
+implicit generic or board specific splitting rule.
+Use ``memdev`` with *memory-backend-ram* backend or ``mem`` (if
+it's supported by used machine type) to define mapping explictly instead.
+Users of existing VMs, wishing to preserve the same RAM distribution, should
+configure it explicitly using ``-numa node,memdev`` options. Current RAM
+distribution can be retrieved using HMP command ``info numa`` and if separate
+memory devices (pc|nv-dimm) are present use ``info memory-device`` and subtract
+device memory from output of ``info numa``.
+
 ``-numa node,mem=``\ *size* (removed in 5.1)
 ''''''''''''''''''''''&#x

[PATCH 0/3] numa: cleanups for 5.2

2020-09-11 Thread Igor Mammedov

Remove deprecated default RAM splitting beween numa
nodes that was deprecated since 4.1, and a couple of
minor numa clean ups.

Igor Mammedov (3):
  numa: drop support for '-numa node' (without memory specified)
  doc: Cleanup "'-mem-path' fallback to RAM" deprecation text
  numa: remove fixup numa_state->num_nodes to MAX_NODES

 include/hw/boards.h|  2 --
 include/sysemu/numa.h  |  4 ---
 docs/system/deprecated.rst | 44 +++-
 hw/core/machine.c  |  1 -
 hw/core/numa.c | 59 --
 hw/i386/pc_piix.c  |  1 -
 hw/i386/pc_q35.c   |  1 -
 hw/ppc/spapr.c |  1 -
 8 files changed, 24 insertions(+), 89 deletions(-)

-- 
2.27.0

[PATCH v5] numa: forbid '-numa node, mem' for 5.1 and newer machine types

2020-06-09 Thread Igor Mammedov

Deprecation period is run out and it's a time to flip the switch
introduced by cd5ff8333a.  Disable legacy option for new machine
types (since 5.1) and amend documentation.

'-numa node,memdev' shall be used instead of disabled option
with new machine types.

Signed-off-by: Igor Mammedov 
Reviewed-by: Michal Privoznik 
Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Greg Kurz 
---
v1:
 - rebased on top of current master
 - move compat mode from 4.2 to 5.0
v2:
 - move deprecation text to recently removed section
v3:
 - increase title line length for (deprecated.rst)
 '``-numa node,mem=``\ *size* (removed in 5.1)'
v4:
 - use error_append_hint() for suggesting valid CLI
v5:
 - add "\n" at the end of error_append_hint()
 - fix gramar/spelling in moved deprecation text

CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-list@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
CC: ebl...@redhat.com
CC: gr...@kaod.org
---
 docs/system/deprecated.rst | 37 -
 hw/arm/virt.c  |  2 +-
 hw/core/numa.c |  7 +++
 hw/i386/pc.c   |  1 -
 hw/i386/pc_piix.c  |  1 +
 hw/i386/pc_q35.c   |  1 +
 hw/ppc/spapr.c |  2 +-
 qemu-options.hx|  9 +
 8 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 544ece0a45..72666ac764 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -101,23 +101,6 @@ error in the future.
 The ``-realtime mlock=on|off`` argument has been replaced by the
 ``-overcommit mem-lock=on|off`` argument.
 
-``-numa node,mem=``\ *size* (since 4.1)
-'''''''''''''''''''''''''''''''''''''''
-
-The parameter ``mem`` of ``-numa node`` is used to assign a part of
-guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
-RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
-so guest end-ups with the fake NUMA configuration with suboptiomal performance.
-However since 2014 there is an alternative way to assign RAM to a NUMA node
-using parameter ``memdev``, which does the same as ``mem`` and adds
-means to actualy manage node RAM on the host side. Use parameter ``memdev``
-with *memory-backend-ram* backend as an replacement for parameter ``mem``
-to achieve the same fake NUMA effect or a properly configured
-*memory-backend-file* backend to actually benefit from NUMA configuration.
-In future new machine versions will not accept the option but it will still
-work with old machine types. User can check QAPI schema to see if the legacy
-option is supported by looking at MachineInfo::numa-mem-supported property.
-
 ``-numa`` node (without memory specified) (since 4.1)
 '''''''''''''''''''''''''''''''''''''''''''''''''''''
 
@@ -516,3 +499,23 @@ long starting at 1MiB, the old command::
 can be rewritten as::
 
   qemu-nbd -t --image-opts 
driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2
+
+Command line options
+
+
+``-numa node,mem=``\ *size* (removed in 5.1)
+''''''''''''''''''''''''''''''''''''''''''''
+
+The parameter ``mem`` of ``-numa node`` was used to assign a part of
+guest RAM to a NUMA node. But when using it, it's impossible to manage a 
specified
+RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
+so the guest ends up with the fake NUMA configuration with suboptiomal 
performance.
+However since 2014 there is an alternative way to assign RAM to a NUMA node
+using parameter ``memdev``, which does the same as ``mem`` and adds
+means to actually manage node RAM on the host side. Use parameter ``memdev``
+with *memory-backend-ram* backend as replacement for parameter ``mem``
+to achieve the same fake NUMA effect or a properly configured
+*memory-backend-file* backend to actually benefit from NUMA configuration.
+New machine versions (since 5.1) will not accept the option but it will still
+work with old machine types. User can check the QAPI schema to see if the

Re: [PATCH v4] numa: forbid '-numa node,mem' for 5.1 and newer machine types

2020-06-08 Thread Igor Mammedov

On Mon, 8 Jun 2020 08:55:08 -0400
"Michael S. Tsirkin"  wrote:

> On Mon, Jun 08, 2020 at 08:03:44AM -0400, Igor Mammedov wrote:
> > Deprecation period is run out and it's a time to flip the switch
> > introduced by cd5ff8333a.  Disable legacy option for new machine
> > types (since 5.1) and amend documentation.
> > 
> > '-numa node,memdev' shall be used instead of disabled option
> > with new machine types.
> > 
> > Signed-off-by: Igor Mammedov 
> > Reviewed-by: Michal Privoznik   
> 
> Reviewed-by: Michael S. Tsirkin 
> 
Thanks!

> numa things so I'm guessing Eduardo's tree?
yep, it's pure NUMA so it should go via Eduardo's tere.


> > ---
> > v1:
> >  - rebased on top of current master
> >  - move compat mode from 4.2 to 5.0
> > v2:
> >  - move deprection text to recently removed section
> > v3:
> >  - increase title line length for (deprecated.rst)
> >  '``-numa node,mem=``\ *size* (removed in 5.1)'
> > v4:
> >  - use error_append_hint() for suggesting valid CLI
> > 
> > CC: peter.mayd...@linaro.org
> > CC: ehabk...@redhat.com
> > CC: marcel.apfelb...@gmail.com
> > CC: m...@redhat.com
> > CC: pbonz...@redhat.com
> > CC: r...@twiddle.net
> > CC: da...@gibson.dropbear.id.au
> > CC: libvir-list@redhat.com
> > CC: qemu-...@nongnu.org
> > CC: qemu-...@nongnu.org
> > CC: ebl...@redhat.com
> > CC: gr...@kaod.org
> > ---
> >  docs/system/deprecated.rst | 37 -
> >  hw/arm/virt.c  |  2 +-
> >  hw/core/numa.c |  7 +++
> >  hw/i386/pc.c   |  1 -
> >  hw/i386/pc_piix.c  |  1 +
> >  hw/i386/pc_q35.c   |  1 +
> >  hw/ppc/spapr.c |  2 +-
> >  qemu-options.hx|  9 +
> >  8 files changed, 36 insertions(+), 24 deletions(-)
> > 
> > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> > index 544ece0a45..e74a5717c8 100644
> > --- a/docs/system/deprecated.rst
> > +++ b/docs/system/deprecated.rst
> > @@ -101,23 +101,6 @@ error in the future.
> >  The ``-realtime mlock=on|off`` argument has been replaced by the
> >  ``-overcommit mem-lock=on|off`` argument.
> >  
> > -``-numa node,mem=``\ *size* (since 4.1)
> > -'''''''''''''''''''''''''''''''''''''''
> > -
> > -The parameter ``mem`` of ``-numa node`` is used to assign a part of
> > -guest RAM to a NUMA node. But when using it, it's impossible to manage 
> > specified
> > -RAM chunk on the host side (like bind it to a host node, setting bind 
> > policy, ...),
> > -so guest end-ups with the fake NUMA configuration with suboptiomal 
> > performance.
> > -However since 2014 there is an alternative way to assign RAM to a NUMA node
> > -using parameter ``memdev``, which does the same as ``mem`` and adds
> > -means to actualy manage node RAM on the host side. Use parameter ``memdev``
> > -with *memory-backend-ram* backend as an replacement for parameter ``mem``
> > -to achieve the same fake NUMA effect or a properly configured
> > -*memory-backend-file* backend to actually benefit from NUMA configuration.
> > -In future new machine versions will not accept the option but it will still
> > -work with old machine types. User can check QAPI schema to see if the 
> > legacy
> > -option is supported by looking at MachineInfo::numa-mem-supported property.
> > -
> >  ``-numa`` node (without memory specified) (since 4.1)
> >  '''''''''''''''''''''''''''''''''''''''''''''''''''''
> >  
> > @@ -516,3 +499,23 @@ long starting at 1MiB, the old command::
> >  can be rewritten as::
> >  
> >qemu-nbd -t --image-opts 
> > driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2
> > +
> > +Command line options
> > +
> > +
> > +``-numa node,mem=``\ *size* (removed in 5.1)
> > +'''''''''''''''''''''''''''''&

[PATCH v4] numa: forbid '-numa node, mem' for 5.1 and newer machine types

2020-06-08 Thread Igor Mammedov

Deprecation period is run out and it's a time to flip the switch
introduced by cd5ff8333a.  Disable legacy option for new machine
types (since 5.1) and amend documentation.

'-numa node,memdev' shall be used instead of disabled option
with new machine types.

Signed-off-by: Igor Mammedov 
Reviewed-by: Michal Privoznik 
---
v1:
 - rebased on top of current master
 - move compat mode from 4.2 to 5.0
v2:
 - move deprection text to recently removed section
v3:
 - increase title line length for (deprecated.rst)
 '``-numa node,mem=``\ *size* (removed in 5.1)'
v4:
 - use error_append_hint() for suggesting valid CLI

CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-list@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
CC: ebl...@redhat.com
CC: gr...@kaod.org
---
 docs/system/deprecated.rst | 37 -
 hw/arm/virt.c  |  2 +-
 hw/core/numa.c |  7 +++
 hw/i386/pc.c   |  1 -
 hw/i386/pc_piix.c  |  1 +
 hw/i386/pc_q35.c   |  1 +
 hw/ppc/spapr.c |  2 +-
 qemu-options.hx|  9 +
 8 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 544ece0a45..e74a5717c8 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -101,23 +101,6 @@ error in the future.
 The ``-realtime mlock=on|off`` argument has been replaced by the
 ``-overcommit mem-lock=on|off`` argument.
 
-``-numa node,mem=``\ *size* (since 4.1)
-'''''''''''''''''''''''''''''''''''''''
-
-The parameter ``mem`` of ``-numa node`` is used to assign a part of
-guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
-RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
-so guest end-ups with the fake NUMA configuration with suboptiomal performance.
-However since 2014 there is an alternative way to assign RAM to a NUMA node
-using parameter ``memdev``, which does the same as ``mem`` and adds
-means to actualy manage node RAM on the host side. Use parameter ``memdev``
-with *memory-backend-ram* backend as an replacement for parameter ``mem``
-to achieve the same fake NUMA effect or a properly configured
-*memory-backend-file* backend to actually benefit from NUMA configuration.
-In future new machine versions will not accept the option but it will still
-work with old machine types. User can check QAPI schema to see if the legacy
-option is supported by looking at MachineInfo::numa-mem-supported property.
-
 ``-numa`` node (without memory specified) (since 4.1)
 '''''''''''''''''''''''''''''''''''''''''''''''''''''
 
@@ -516,3 +499,23 @@ long starting at 1MiB, the old command::
 can be rewritten as::
 
   qemu-nbd -t --image-opts 
driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2
+
+Command line options
+
+
+``-numa node,mem=``\ *size* (removed in 5.1)
+''''''''''''''''''''''''''''''''''''''''''''
+
+The parameter ``mem`` of ``-numa node`` is used to assign a part of
+guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
+RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
+so guest end-ups with the fake NUMA configuration with suboptiomal performance.
+However since 2014 there is an alternative way to assign RAM to a NUMA node
+using parameter ``memdev``, which does the same as ``mem`` and adds
+means to actualy manage node RAM on the host side. Use parameter ``memdev``
+with *memory-backend-ram* backend as an replacement for parameter ``mem``
+to achieve the same fake NUMA effect or a properly configured
+*memory-backend-file* backend to actually benefit from NUMA configuration.
+In future new machine versions will not accept the option but it will still
+work with old machine types. User can check QAPI schema to see if the legacy
+option is supported by looking at MachineInfo::numa-mem-supported property.
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 37462a6f78..063d4703f7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c

Re: [PATCH v3] numa: forbid '-numa node, mem' for 5.1 and newer machine types

2020-06-08 Thread Igor Mammedov

On Fri, 5 Jun 2020 18:47:58 +0200
Greg Kurz  wrote:

> On Fri,  5 Jun 2020 12:03:21 -0400
> Igor Mammedov  wrote:
> 
> > Deprecation period is run out and it's a time to flip the switch
> > introduced by cd5ff8333a.  Disable legacy option for new machine
> > types (since 5.1) and amend documentation.
> > 
> > '-numa node,memdev' shall be used instead of disabled option
> > with new machine types.
> > 
> > Signed-off-by: Igor Mammedov 
> > Reviewed-by: Michal Privoznik 
> > ---
> > v1:
> >  - rebased on top of current master
> >  - move compat mode from 4.2 to 5.0
> > v2:
> >  - move deprection text to recently removed section
> > v3:
> >  - increase title line length for (deprecated.rst)
> >  '``-numa node,mem=``\ *size* (removed in 5.1)'
> > 
> > CC: peter.mayd...@linaro.org
> > CC: ehabk...@redhat.com
> > CC: marcel.apfelb...@gmail.com
> > CC: m...@redhat.com
> > CC: pbonz...@redhat.com
> > CC: r...@twiddle.net
> > CC: da...@gibson.dropbear.id.au
> > CC: libvir-list@redhat.com
> > CC: qemu-...@nongnu.org
> > CC: qemu-...@nongnu.org
> > ---
> >  docs/system/deprecated.rst | 37 -
> >  hw/arm/virt.c  |  2 +-
> >  hw/core/numa.c |  6 ++
> >  hw/i386/pc.c   |  1 -
> >  hw/i386/pc_piix.c  |  1 +
> >  hw/i386/pc_q35.c   |  1 +
> >  hw/ppc/spapr.c |  2 +-
> >  qemu-options.hx|  9 +
> >  8 files changed, 35 insertions(+), 24 deletions(-)
> > 
> > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> > index f0061f94aa..502e41ff35 100644
> > --- a/docs/system/deprecated.rst
> > +++ b/docs/system/deprecated.rst
> > @@ -101,23 +101,6 @@ error in the future.
> >  The ``-realtime mlock=on|off`` argument has been replaced by the
> >  ``-overcommit mem-lock=on|off`` argument.
> >  
> > -``-numa node,mem=``\ *size* (since 4.1)
> > -'''''''''''''''''''''''''''''''''''''''
> > -
> > -The parameter ``mem`` of ``-numa node`` is used to assign a part of
> > -guest RAM to a NUMA node. But when using it, it's impossible to manage 
> > specified
> > -RAM chunk on the host side (like bind it to a host node, setting bind 
> > policy, ...),
> > -so guest end-ups with the fake NUMA configuration with suboptiomal 
> > performance.
> > -However since 2014 there is an alternative way to assign RAM to a NUMA node
> > -using parameter ``memdev``, which does the same as ``mem`` and adds
> > -means to actualy manage node RAM on the host side. Use parameter ``memdev``
> > -with *memory-backend-ram* backend as an replacement for parameter ``mem``
> > -to achieve the same fake NUMA effect or a properly configured
> > -*memory-backend-file* backend to actually benefit from NUMA configuration.
> > -In future new machine versions will not accept the option but it will still
> > -work with old machine types. User can check QAPI schema to see if the 
> > legacy
> > -option is supported by looking at MachineInfo::numa-mem-supported property.
> > -
> >  ``-numa`` node (without memory specified) (since 4.1)
> >  '''''''''''''''''''''''''''''''''''''''''''''''''''''
> >  
> > @@ -512,3 +495,23 @@ long starting at 1MiB, the old command::
> >  can be rewritten as::
> >  
> >qemu-nbd -t --image-opts 
> > driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2
> > +
> > +Command line options
> > +
> > +
> > +``-numa node,mem=``\ *size* (removed in 5.1)
> > +''''''''''''''''''''''''''''''''''''''''''''
> > +
> > +The parameter ``mem`` of ``-numa node`` is used to assign a part of
> > +guest RAM to a NUMA node. But when using it, it's impossible to manage 
> > specified
> > +RAM chunk on the host side (like bind it

Re: [PATCH] numa: forbid '-numa node, mem' for 5.1 and newer machine types

2020-06-05 Thread Igor Mammedov

On Thu, 4 Jun 2020 07:22:51 -0500
Eric Blake  wrote:

> On 6/2/20 3:41 AM, Igor Mammedov wrote:
> > Deprecation period is run out and it's a time to flip the switch
> > introduced by cd5ff8333a.  Disable legacy option for new machine
> > types (since 5.1) and amend documentation.
> > 
> > '-numa node,memdev' shall be used instead of disabled option
> > with new machine types.
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> >   - rebased on top of current master
> >   - move compat mode from 4.2 to 5.0
> >   
> 
> >   docs/system/deprecated.rst | 17 -  
> 
> Lately, when we remove something, we've been moving the documentation 
> from 'will be deprecated' to a later section of the document 'has been 
> removed', so that the history is not lost.  But this diffstat says you 
> just deleted, rather than moved, that hunk.
> 
I didn't know that,
I'll send v2 with this hunk moved to removed section

[PATCH v2] numa: forbid '-numa node, mem' for 5.1 and newer machine types

2020-06-05 Thread Igor Mammedov

Deprecation period is run out and it's a time to flip the switch
introduced by cd5ff8333a.  Disable legacy option for new machine
types (since 5.1) and amend documentation.

'-numa node,memdev' shall be used instead of disabled option
with new machine types.

Signed-off-by: Igor Mammedov 
Reviewed-by: Michal Privoznik 
---
v1:
 - rebased on top of current master
 - move compat mode from 4.2 to 5.0
v2:
 - move deprection text to recently removed section
 - pick up reviewed-bys

CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-list@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
CC: ebl...@redhat.com
---
 docs/system/deprecated.rst | 37 -
 hw/arm/virt.c  |  2 +-
 hw/core/numa.c |  6 ++
 hw/i386/pc.c   |  1 -
 hw/i386/pc_piix.c  |  1 +
 hw/i386/pc_q35.c   |  1 +
 hw/ppc/spapr.c |  2 +-
 qemu-options.hx|  9 +
 8 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index f0061f94aa..6f717e4a1d 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -101,23 +101,6 @@ error in the future.
 The ``-realtime mlock=on|off`` argument has been replaced by the
 ``-overcommit mem-lock=on|off`` argument.
 
-``-numa node,mem=``\ *size* (since 4.1)
-'''''''''''''''''''''''''''''''''''''''
-
-The parameter ``mem`` of ``-numa node`` is used to assign a part of
-guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
-RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
-so guest end-ups with the fake NUMA configuration with suboptiomal performance.
-However since 2014 there is an alternative way to assign RAM to a NUMA node
-using parameter ``memdev``, which does the same as ``mem`` and adds
-means to actualy manage node RAM on the host side. Use parameter ``memdev``
-with *memory-backend-ram* backend as an replacement for parameter ``mem``
-to achieve the same fake NUMA effect or a properly configured
-*memory-backend-file* backend to actually benefit from NUMA configuration.
-In future new machine versions will not accept the option but it will still
-work with old machine types. User can check QAPI schema to see if the legacy
-option is supported by looking at MachineInfo::numa-mem-supported property.
-
 ``-numa`` node (without memory specified) (since 4.1)
 '''''''''''''''''''''''''''''''''''''''''''''''''''''
 
@@ -512,3 +495,23 @@ long starting at 1MiB, the old command::
 can be rewritten as::
 
   qemu-nbd -t --image-opts 
driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2
+
+Command line options
+
+
+``-numa node,mem=``\ *size* (removed in 5.1)
+'''''''''''''''''''''''''''''''''''''''
+
+The parameter ``mem`` of ``-numa node`` is used to assign a part of
+guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
+RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
+so guest end-ups with the fake NUMA configuration with suboptiomal performance.
+However since 2014 there is an alternative way to assign RAM to a NUMA node
+using parameter ``memdev``, which does the same as ``mem`` and adds
+means to actualy manage node RAM on the host side. Use parameter ``memdev``
+with *memory-backend-ram* backend as an replacement for parameter ``mem``
+to achieve the same fake NUMA effect or a properly configured
+*memory-backend-file* backend to actually benefit from NUMA configuration.
+In future new machine versions will not accept the option but it will still
+work with old machine types. User can check QAPI schema to see if the legacy
+option is supported by looking at MachineInfo::numa-mem-supported property.
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 37462a6f78..063d4703f7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2262,7 +2262,6 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 hc->pre_plug = virt_machine_device_pre_plug_cb;
 hc->plug = virt_machine_device_plug_cb;

[PATCH v3] numa: forbid '-numa node, mem' for 5.1 and newer machine types

2020-06-05 Thread Igor Mammedov

Deprecation period is run out and it's a time to flip the switch
introduced by cd5ff8333a.  Disable legacy option for new machine
types (since 5.1) and amend documentation.

'-numa node,memdev' shall be used instead of disabled option
with new machine types.

Signed-off-by: Igor Mammedov 
Reviewed-by: Michal Privoznik 
---
v1:
 - rebased on top of current master
 - move compat mode from 4.2 to 5.0
v2:
 - move deprection text to recently removed section
v3:
 - increase title line length for (deprecated.rst)
 '``-numa node,mem=``\ *size* (removed in 5.1)'

CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-list@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
---
 docs/system/deprecated.rst | 37 -
 hw/arm/virt.c  |  2 +-
 hw/core/numa.c |  6 ++
 hw/i386/pc.c   |  1 -
 hw/i386/pc_piix.c  |  1 +
 hw/i386/pc_q35.c   |  1 +
 hw/ppc/spapr.c |  2 +-
 qemu-options.hx|  9 +
 8 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index f0061f94aa..502e41ff35 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -101,23 +101,6 @@ error in the future.
 The ``-realtime mlock=on|off`` argument has been replaced by the
 ``-overcommit mem-lock=on|off`` argument.
 
-``-numa node,mem=``\ *size* (since 4.1)
-'''''''''''''''''''''''''''''''''''''''
-
-The parameter ``mem`` of ``-numa node`` is used to assign a part of
-guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
-RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
-so guest end-ups with the fake NUMA configuration with suboptiomal performance.
-However since 2014 there is an alternative way to assign RAM to a NUMA node
-using parameter ``memdev``, which does the same as ``mem`` and adds
-means to actualy manage node RAM on the host side. Use parameter ``memdev``
-with *memory-backend-ram* backend as an replacement for parameter ``mem``
-to achieve the same fake NUMA effect or a properly configured
-*memory-backend-file* backend to actually benefit from NUMA configuration.
-In future new machine versions will not accept the option but it will still
-work with old machine types. User can check QAPI schema to see if the legacy
-option is supported by looking at MachineInfo::numa-mem-supported property.
-
 ``-numa`` node (without memory specified) (since 4.1)
 '''''''''''''''''''''''''''''''''''''''''''''''''''''
 
@@ -512,3 +495,23 @@ long starting at 1MiB, the old command::
 can be rewritten as::
 
   qemu-nbd -t --image-opts 
driver=raw,offset=1M,size=100M,file.driver=qcow2,file.file.driver=file,file.file.filename=file.qcow2
+
+Command line options
+
+
+``-numa node,mem=``\ *size* (removed in 5.1)
+''''''''''''''''''''''''''''''''''''''''''''
+
+The parameter ``mem`` of ``-numa node`` is used to assign a part of
+guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
+RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
+so guest end-ups with the fake NUMA configuration with suboptiomal performance.
+However since 2014 there is an alternative way to assign RAM to a NUMA node
+using parameter ``memdev``, which does the same as ``mem`` and adds
+means to actualy manage node RAM on the host side. Use parameter ``memdev``
+with *memory-backend-ram* backend as an replacement for parameter ``mem``
+to achieve the same fake NUMA effect or a properly configured
+*memory-backend-file* backend to actually benefit from NUMA configuration.
+In future new machine versions will not accept the option but it will still
+work with old machine types. User can check QAPI schema to see if the legacy
+option is supported by looking at MachineInfo::numa-mem-supported property.
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 37462a6f78..063d4703f7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2262,7 +2262,6 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 hc-&g

Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=

2020-06-05 Thread Igor Mammedov

On Thu, 4 Jun 2020 10:58:01 +0200
Michal Privoznik  wrote:

> On 5/27/20 3:58 PM, Igor Mammedov wrote:
> > On Tue, 26 May 2020 17:31:09 +0200
> > Michal Privoznik  wrote:
> >   
> >> On 5/26/20 4:51 PM, Igor Mammedov wrote:  
> >>> On Mon, 25 May 2020 10:05:08 +0200
> >>> Michal Privoznik  wrote:
> >>>  
> >>>>
> >>>> This is a problem. The domain XML that is provided can't be changed,
> >>>> mostly because mgmt apps construct it on the fly and then just pass it
> >>>> as a RO string to libvirt. While libvirt could create a separate cache,
> >>>> there has to be a better way.
> >>>>
> >>>> I mean, I can add some more code that once the guest is running
> >>>> preserves the mapping during migration. But that assumes a running QEMU.
> >>>> When starting a domain from scratch, is it acceptable it vCPU topology
> >>>> changes? I suspect it is not.  
> >>> I'm not sure I got you but
> >>> vCPU topology isn't changnig but when starting QEMU, user has to map
> >>> 'concrete vCPUs' to spencific numa nodes. The issue here is that
> >>> to specify concrete vCPUs user needs to get layout from QEMU first
> >>> as it's a function of target/machine/-smp and possibly cpu type.  
> >>
> >> Assume the following config: 4 vCPUs (2 sockets, 2 cores, 1 thread
> >> topology) and 2 NUMA nodes and the following assignment to NUMA:
> >>
> >> node 0: cpus=0-1
> >> node 1: cpus=2-3
> >>
> >> With old libvirt & qemu (and assuming x86_64 - not EPYC), I assume the
> >> following topology is going to be used:
> >>
> >> node 0: socket=0,core=0,thread=0 (vCPU0)  socket=0,core=1,thread=0 (vCPU1)
> >> node 1: socket=1,core=0,thread=0 (vCPU2)  socket=1,core=1,thread=0 (vCPU3)
> >>
> >> Now, user upgrades libvirt & qemu but doesn't change the config. And on
> >> a fresh new start (no migration), they might get a different topology:
> >>
> >> node 0: socket=0,core=0,thread=0 (vCPU0)  socket=1,core=0,thread=0 (vCPU1)
> >> node 1: socket=0,core=1,thread=0 (vCPU2)  socket=1,core=1,thread=0 (vCPU3) 
> >>  
> > 
> > that shouldn't happen at least for as long as machine version stays the 
> > same  
> 
> Shouldn't as in it's bad if it happens or as in QEMU won't change 
> topology for released machine types?
it's the second

> Well, we are talking about libvirt 
> generating the topology.
> 
> >> The problem here is not how to assign vCPUs to NUMA nodes, the problem
> >> is how to translate vCPU IDs to socket=,core=,thread=.  
> > if you are talking about libvirt's vCPU IDs, then it's separate issue
> > as it's user facing API, I think it should not rely on cpu_index.
> > Instead it should map vCPU IDs to ([socket,]core[,thread]) tuple
> > or maybe drop notion of vCPU IDs and expose ([socket,]core[,thread])
> > to users if they ask for numa aware config.  
> 
> And this is the thing I am asking. How to map vCPU IDs to 
> socket,core,thread and how to do it reliably.
vCPU ID has the same drawbacks as cpu_index in QEMU, it provides zero
information about topology. Which is fine in non NUMA case since user
doesn't care about topology at all (I'm assuming it's libvirt who does
pinning and it would use topology info to pin vcpus correctly). 
But for NUMA case, as a user I'd like to see/use topology instead of
vCPU ID, especially if user is in charge of assigning vCPUs to nodes.

I'd drop vCPU IDs concept altogether and use ([socket,]core[,thread])
tuple to describe vCPUs instead. It should work fine for both usecases
and you wouldn't have to do mapping to vCPU IDs. (I'm talking here about
new configs that use new machine types and ignore compatibility. 
More on the later see below)
 
> > 
> > PS:
> > I'm curious how libvirt currently implements numa mapping and
> > how it's correlated with pinnig to host nodes?
> > Does it have any sort of code to calculate topology based on cpu_index
> > so it could properly assign vCPUs to nodes or all the pain of
> > assigning vCPU IDs to nodes is on the user shoulders?  
> 
> It's on users. In the domain XML they specify number of vCPUs, and then 
> they can assign individual IDs to NUMA nodes. For instance:
> 
>8
> 
>
>   
> 
> 
>   
>
> 
> translates to:
> 
>-smp 8,sockets=8,cores=1,threads=1
>-numa node,nodeid

Re: [PATCH 5/5] qemu_validate.c: revert NUMA CPU range user warning

2020-06-02 Thread Igor Mammedov

On Mon, 1 Jun 2020 17:14:20 -0300
Daniel Henrique Barboza  wrote:

> On 6/1/20 4:40 PM, Peter Krempa wrote:
> > On Mon, Jun 01, 2020 at 14:50:41 -0300, Daniel Henrique Barboza wrote:  
> >> Now that we have the auto-fill code in place, and with proper documentation
> >> to let the user know that (1) we will auto-fill the NUMA cpus up to the
> >> number to maximum VCPUs number if QEMU supports it and (2) the user
> >> is advised to always supply a complete NUMA topology, this warning
> >> is unneeded.
> >>
> >> This reverts commit 38d2e033686b5cc274f8f55075ce1985b71e329a.  
> > 
> > Since we already have the validation in place for some time now I think
> > we should just keep it. The auto-filling would be a useful hack to work
> > around if config breaks, but judged by itself it's of questionable
> > benefit.  
> 
> That's a good point. I agree that removing the message after being in place
> for this long is more trouble than it's worth.
> 
> > 
> > Specifically users might end up with a topology which they didn't
> > expect. Reasoning is basically the same as with qemu. Any default
> > behaviour here is a policy decision and it might not suit all uses.
> >  
> 
>   
> An ideal situation would be QEMU to never accept incomplete NUMA topologies
> in the first place.
At least with your series I can safely drop deprecated incomplete NUMA 
topologies
on QEMU side (which were producing warnings for a while)

> 
> Given that this wasn't the case and now there might be a plethora of guests
> running with goofy topologies all around, the already existing warning
> message + this auto-fill hack + documentation mentioning that users should
> avoid these topologies is a fine solution from Libvirt side, in my
> estimation.
> 
> 
> Thanks,
> 
> 
> DHB
>

[PATCH] numa: forbid '-numa node, mem' for 5.1 and newer machine types

2020-06-02 Thread Igor Mammedov

Deprecation period is run out and it's a time to flip the switch
introduced by cd5ff8333a.  Disable legacy option for new machine
types (since 5.1) and amend documentation.

'-numa node,memdev' shall be used instead of disabled option
with new machine types.

Signed-off-by: Igor Mammedov 
---
 - rebased on top of current master
 - move compat mode from 4.2 to 5.0

CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-list@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
---
 docs/system/deprecated.rst | 17 -
 hw/arm/virt.c  |  2 +-
 hw/core/numa.c |  6 ++
 hw/i386/pc.c   |  1 -
 hw/i386/pc_piix.c  |  1 +
 hw/i386/pc_q35.c   |  1 +
 hw/ppc/spapr.c |  2 +-
 qemu-options.hx|  9 +
 8 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index f0061f94aa..57edc075c2 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -101,23 +101,6 @@ error in the future.
 The ``-realtime mlock=on|off`` argument has been replaced by the
 ``-overcommit mem-lock=on|off`` argument.
 
-``-numa node,mem=``\ *size* (since 4.1)
-'''''''''''''''''''''''''''''''''''''''
-
-The parameter ``mem`` of ``-numa node`` is used to assign a part of
-guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
-RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
-so guest end-ups with the fake NUMA configuration with suboptiomal performance.
-However since 2014 there is an alternative way to assign RAM to a NUMA node
-using parameter ``memdev``, which does the same as ``mem`` and adds
-means to actualy manage node RAM on the host side. Use parameter ``memdev``
-with *memory-backend-ram* backend as an replacement for parameter ``mem``
-to achieve the same fake NUMA effect or a properly configured
-*memory-backend-file* backend to actually benefit from NUMA configuration.
-In future new machine versions will not accept the option but it will still
-work with old machine types. User can check QAPI schema to see if the legacy
-option is supported by looking at MachineInfo::numa-mem-supported property.
-
 ``-numa`` node (without memory specified) (since 4.1)
 '''''''''''''''''''''''''''''''''''''''''''''''''''''
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 37462a6f78..063d4703f7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2262,7 +2262,6 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 hc->pre_plug = virt_machine_device_pre_plug_cb;
 hc->plug = virt_machine_device_plug_cb;
 hc->unplug_request = virt_machine_device_unplug_request_cb;
-mc->numa_mem_supported = true;
 mc->nvdimm_supported = true;
 mc->auto_enable_numa_with_memhp = true;
 mc->default_ram_id = "mach-virt.ram";
@@ -2375,6 +2374,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 1)
 static void virt_machine_5_0_options(MachineClass *mc)
 {
 virt_machine_5_1_options(mc);
+mc->numa_mem_supported = true;
 }
 DEFINE_VIRT_MACHINE(5, 0)
 
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 316bc50d75..05be412e59 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -117,6 +117,12 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 }
 
 if (node->has_mem) {
+if (!mc->numa_mem_supported) {
+error_setg(errp, "Parameter -numa node,mem is not supported by 
this"
+  " machine type. Use -numa node,memdev instead");
+return;
+}
+
 numa_info[nodenr].node_mem = node->mem;
 if (!qtest_enabled()) {
 warn_report("Parameter -numa node,mem is deprecated,"
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2128f3d6fe..a86136069c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1960,7 +1960,6 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 hc->unplug = pc_machine_device_unplug_cb;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
-mc->numa_mem_supported = true;
 mc->default_ram_id = "pc.ram";
 
 object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix

Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=

2020-05-27 Thread Igor Mammedov

On Tue, 26 May 2020 17:31:09 +0200
Michal Privoznik  wrote:

> On 5/26/20 4:51 PM, Igor Mammedov wrote:
> > On Mon, 25 May 2020 10:05:08 +0200
> > Michal Privoznik  wrote:
> >   
> >>
> >> This is a problem. The domain XML that is provided can't be changed,
> >> mostly because mgmt apps construct it on the fly and then just pass it
> >> as a RO string to libvirt. While libvirt could create a separate cache,
> >> there has to be a better way.
> >>
> >> I mean, I can add some more code that once the guest is running
> >> preserves the mapping during migration. But that assumes a running QEMU.
> >> When starting a domain from scratch, is it acceptable it vCPU topology
> >> changes? I suspect it is not.  
> > I'm not sure I got you but
> > vCPU topology isn't changnig but when starting QEMU, user has to map
> > 'concrete vCPUs' to spencific numa nodes. The issue here is that
> > to specify concrete vCPUs user needs to get layout from QEMU first
> > as it's a function of target/machine/-smp and possibly cpu type.  
> 
> Assume the following config: 4 vCPUs (2 sockets, 2 cores, 1 thread 
> topology) and 2 NUMA nodes and the following assignment to NUMA:
> 
> node 0: cpus=0-1
> node 1: cpus=2-3
> 
> With old libvirt & qemu (and assuming x86_64 - not EPYC), I assume the 
> following topology is going to be used:
> 
> node 0: socket=0,core=0,thread=0 (vCPU0)  socket=0,core=1,thread=0 (vCPU1)
> node 1: socket=1,core=0,thread=0 (vCPU2)  socket=1,core=1,thread=0 (vCPU3)
> 
> Now, user upgrades libvirt & qemu but doesn't change the config. And on 
> a fresh new start (no migration), they might get a different topology:
> 
> node 0: socket=0,core=0,thread=0 (vCPU0)  socket=1,core=0,thread=0 (vCPU1)
> node 1: socket=0,core=1,thread=0 (vCPU2)  socket=1,core=1,thread=0 (vCPU3)

that shouldn't happen at least for as long as machine version stays the same

> (This is a very trivial example that I am intentionally making look bad, 
> but the thing is, there are some CPUs with very weird vCPU -> 
> socket/core/thread mappings).
>
> The problem here is that with this new version it is libvirt who 
> configured the vCPU -> NUMA mapping (using -numa cpu). Why so wrong? 
> Well it had no way to ask qemu how it used to be. Okay, so we add an 
> interface to QEMU (say -preconfig + query-hotpluggable-cpus) which will 
> do the mapping and keep it there indefinitely. But if the interface is 
> already there (and "always" will be), I don't see need for the extra 
> step (libvirt asking QEMU for the old mapping).
with cpu_index users don't know what CPUs they assing where,
and in some cases (spapr) it doesn't really maps into board supported CPU model 
well.
We can add and keep cpu_index in query-hotpluggable-cpus to help with migration
for old machine types from old CLI to the new one, but otherwise cpu_index would
disapear from user visible inerface.
I'd like to drop duplicate code supporting ambiguose '-numa node,cpus' (and not 
always
properly working interface) and keep only single variant '-numa cpu=' to do 
numa mapping,
which uses CPU's topology properties to describe CPUs, and unifies it with
the way it's done with cpu hotplug.

> The problem here is not how to assign vCPUs to NUMA nodes, the problem 
> is how to translate vCPU IDs to socket=,core=,thread=.
if you are talking about libvirt's vCPU IDs, then it's separate issue
as it's user facing API, I think it should not rely on cpu_index.
Instead it should map vCPU IDs to ([socket,]core[,thread]) tuple
or maybe drop notion of vCPU IDs and expose ([socket,]core[,thread])
to users if they ask for numa aware config.

PS:
I'm curious how libvirt currently implements numa mapping and
how it's correlated with pinnig to host nodes?
Does it have any sort of code to calculate topology based on cpu_index
so it could properly assign vCPUs to nodes or all the pain of
assigning vCPU IDs to nodes is on the user shoulders?

> > that applies not only '-numa cpu' but also to -device cpufoo,
> > that's why query-hotpluggable-cpus was introduced to let
> > user get the list of possible CPUs (including topo properties needed to
> > create them) for a given set of CLI options.
> > 
> > If I recall right libvirt uses topo properies during cpu hotplug but
> > treats it mainly as opaqueue info so it could feed it back to QEMU.
> > 
> >   
> >>>> tries to avoid that as much as it can.
> >>>> 
> >>>>>
> >>>>> How to present it to libvirt user I'm not sure (give them that list 
> >>

Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=

2020-05-26 Thread Igor Mammedov

On Mon, 25 May 2020 10:05:08 +0200
Michal Privoznik  wrote:

> On 5/22/20 7:18 PM, Igor Mammedov wrote:
> > On Fri, 22 May 2020 18:28:31 +0200
> > Michal Privoznik  wrote:
> >   
> >> On 5/22/20 6:07 PM, Igor Mammedov wrote:  
> >>> On Fri, 22 May 2020 16:14:14 +0200
> >>> Michal Privoznik  wrote:
> >>>  
> >>>> QEMU is trying to obsolete -numa node,cpus= because that uses
> >>>> ambiguous vCPU id to [socket, die, core, thread] mapping. The new
> >>>> form is:
> >>>>
> >>>> -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T
> >>>>
> >>>> which is repeated for every vCPU and places it at [S, D, C, T]
> >>>> into guest NUMA node N.
> >>>>
> >>>> While in general this is magic mapping, we can deal with it.
> >>>> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology
> >>>> is given then maxvcpus must be sockets * dies * cores * threads
> >>>> (i.e. there are no 'holes').
> >>>> Secondly, if no topology is given then libvirt itself places each
> >>>> vCPU into a different socket (basically, it fakes topology of:
> >>>> [maxvcpus, 1, 1, 1])
> >>>> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs
> >>>> onto topology, to make sure vCPUs don't start to move around.
> >>>>
> >>>> Note, migration from old to new cmd line works and therefore
> >>>> doesn't need any special handling.
> >>>>
> >>>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085
> >>>>
> >>>> Signed-off-by: Michal Privoznik 
> >>>> ---
> >>>>src/qemu/qemu_command.c   | 108 +-
> >>>>.../hugepages-nvdimm.x86_64-latest.args   |   4 +-
> >>>>...memory-default-hugepage.x86_64-latest.args |  10 +-
> >>>>.../memfd-memory-numa.x86_64-latest.args  |  10 +-
> >>>>...y-hotplug-nvdimm-access.x86_64-latest.args |   4 +-
> >>>>...ry-hotplug-nvdimm-align.x86_64-latest.args |   4 +-
> >>>>...ry-hotplug-nvdimm-label.x86_64-latest.args |   4 +-
> >>>>...ory-hotplug-nvdimm-pmem.x86_64-latest.args |   4 +-
> >>>>...ory-hotplug-nvdimm-ppc64.ppc64-latest.args |   4 +-
> >>>>...hotplug-nvdimm-readonly.x86_64-latest.args |   4 +-
> >>>>.../memory-hotplug-nvdimm.x86_64-latest.args  |   4 +-
> >>>>...vhost-user-fs-fd-memory.x86_64-latest.args |   4 +-
> >>>>...vhost-user-fs-hugepages.x86_64-latest.args |   4 +-
> >>>>...host-user-gpu-secondary.x86_64-latest.args |   3 +-
> >>>>.../vhost-user-vga.x86_64-latest.args |   3 +-
> >>>>15 files changed, 158 insertions(+), 16 deletions(-)
> >>>>
> >>>> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> >>>> index 7d84fd8b5e..0de4fe4905 100644
> >>>> --- a/src/qemu/qemu_command.c
> >>>> +++ b/src/qemu/qemu_command.c
> >>>> @@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf,
> >>>>}
> >>>>
> >>>>
> >>>> +/**
> >>>> + * qemuTranlsatevCPUID:
> >>>> + *
> >>>> + * For given vCPU @id and vCPU topology (@cpu) compute corresponding
> >>>> + * @socket, @die, @core and @thread). This assumes linear topology,
> >>>> + * that is every [socket, die, core, thread] combination is valid vCPU
> >>>> + * ID and there are no 'holes'. This is ensured by
> >>>> + * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is
> >>>> + * set.  
> >>> I wouldn't make this assumption, each machine can have (and has) it's own 
> >>> layout,
> >>> and now it's not hard to change that per machine version if necessary.
> >>>
> >>> I'd suppose one could pull the list of possible CPUs from QEMU started
> >>> in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS
> >>> and then continue to configure numa with QMP commands using provided
> >>> CPUs layout.  
> >>
> >> Continue where? At the 'preconfig mode' the guest is already started,
> >> isn't it? Are you suggesting that libvirt starts a dummy QEMU process

Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=

2020-05-22 Thread Igor Mammedov

On Fri, 22 May 2020 18:28:31 +0200
Michal Privoznik  wrote:

> On 5/22/20 6:07 PM, Igor Mammedov wrote:
> > On Fri, 22 May 2020 16:14:14 +0200
> > Michal Privoznik  wrote:
> >   
> >> QEMU is trying to obsolete -numa node,cpus= because that uses
> >> ambiguous vCPU id to [socket, die, core, thread] mapping. The new
> >> form is:
> >>
> >>-numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T
> >>
> >> which is repeated for every vCPU and places it at [S, D, C, T]
> >> into guest NUMA node N.
> >>
> >> While in general this is magic mapping, we can deal with it.
> >> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology
> >> is given then maxvcpus must be sockets * dies * cores * threads
> >> (i.e. there are no 'holes').
> >> Secondly, if no topology is given then libvirt itself places each
> >> vCPU into a different socket (basically, it fakes topology of:
> >> [maxvcpus, 1, 1, 1])
> >> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs
> >> onto topology, to make sure vCPUs don't start to move around.
> >>
> >> Note, migration from old to new cmd line works and therefore
> >> doesn't need any special handling.
> >>
> >> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085
> >>
> >> Signed-off-by: Michal Privoznik 
> >> ---
> >>   src/qemu/qemu_command.c   | 108 +-
> >>   .../hugepages-nvdimm.x86_64-latest.args   |   4 +-
> >>   ...memory-default-hugepage.x86_64-latest.args |  10 +-
> >>   .../memfd-memory-numa.x86_64-latest.args  |  10 +-
> >>   ...y-hotplug-nvdimm-access.x86_64-latest.args |   4 +-
> >>   ...ry-hotplug-nvdimm-align.x86_64-latest.args |   4 +-
> >>   ...ry-hotplug-nvdimm-label.x86_64-latest.args |   4 +-
> >>   ...ory-hotplug-nvdimm-pmem.x86_64-latest.args |   4 +-
> >>   ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args |   4 +-
> >>   ...hotplug-nvdimm-readonly.x86_64-latest.args |   4 +-
> >>   .../memory-hotplug-nvdimm.x86_64-latest.args  |   4 +-
> >>   ...vhost-user-fs-fd-memory.x86_64-latest.args |   4 +-
> >>   ...vhost-user-fs-hugepages.x86_64-latest.args |   4 +-
> >>   ...host-user-gpu-secondary.x86_64-latest.args |   3 +-
> >>   .../vhost-user-vga.x86_64-latest.args |   3 +-
> >>   15 files changed, 158 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> >> index 7d84fd8b5e..0de4fe4905 100644
> >> --- a/src/qemu/qemu_command.c
> >> +++ b/src/qemu/qemu_command.c
> >> @@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf,
> >>   }
> >>   
> >>   
> >> +/**
> >> + * qemuTranlsatevCPUID:
> >> + *
> >> + * For given vCPU @id and vCPU topology (@cpu) compute corresponding
> >> + * @socket, @die, @core and @thread). This assumes linear topology,
> >> + * that is every [socket, die, core, thread] combination is valid vCPU
> >> + * ID and there are no 'holes'. This is ensured by
> >> + * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is
> >> + * set.  
> > I wouldn't make this assumption, each machine can have (and has) it's own 
> > layout,
> > and now it's not hard to change that per machine version if necessary.
> > 
> > I'd suppose one could pull the list of possible CPUs from QEMU started
> > in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS
> > and then continue to configure numa with QMP commands using provided
> > CPUs layout.  
> 
> Continue where? At the 'preconfig mode' the guest is already started, 
> isn't it? Are you suggesting that libvirt starts a dummy QEMU process, 
> fetches the CPU topology from it an then starts if for real? Libvirt 
QEMU started but it's very far from starting guest, at that time it's possible
configure numa mapping at runtime and continue to -S or running state
without restarting QEMU. For the follow up starts, used topology and numa 
options
can be cached and reused at CLI time as long as machine/-smp combination stays
the same.

> tries to avoid that as much as it can.
> 
> > 
> > How to present it to libvirt user I'm not sure (give them that list perhaps
> > and let select from it???)  
> 
> This is what I am trying to figure out in the cover letter. Maybe we 
> need to let users configure the topology (well, vCPU id to [socket, die, 
&

Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=

2020-05-22 Thread Igor Mammedov

On Fri, 22 May 2020 16:14:14 +0200
Michal Privoznik  wrote:

> QEMU is trying to obsolete -numa node,cpus= because that uses
> ambiguous vCPU id to [socket, die, core, thread] mapping. The new
> form is:
> 
>   -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T
> 
> which is repeated for every vCPU and places it at [S, D, C, T]
> into guest NUMA node N.
> 
> While in general this is magic mapping, we can deal with it.
> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology
> is given then maxvcpus must be sockets * dies * cores * threads
> (i.e. there are no 'holes').
> Secondly, if no topology is given then libvirt itself places each
> vCPU into a different socket (basically, it fakes topology of:
> [maxvcpus, 1, 1, 1])
> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs
> onto topology, to make sure vCPUs don't start to move around.
> 
> Note, migration from old to new cmd line works and therefore
> doesn't need any special handling.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/qemu/qemu_command.c   | 108 +-
>  .../hugepages-nvdimm.x86_64-latest.args   |   4 +-
>  ...memory-default-hugepage.x86_64-latest.args |  10 +-
>  .../memfd-memory-numa.x86_64-latest.args  |  10 +-
>  ...y-hotplug-nvdimm-access.x86_64-latest.args |   4 +-
>  ...ry-hotplug-nvdimm-align.x86_64-latest.args |   4 +-
>  ...ry-hotplug-nvdimm-label.x86_64-latest.args |   4 +-
>  ...ory-hotplug-nvdimm-pmem.x86_64-latest.args |   4 +-
>  ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args |   4 +-
>  ...hotplug-nvdimm-readonly.x86_64-latest.args |   4 +-
>  .../memory-hotplug-nvdimm.x86_64-latest.args  |   4 +-
>  ...vhost-user-fs-fd-memory.x86_64-latest.args |   4 +-
>  ...vhost-user-fs-hugepages.x86_64-latest.args |   4 +-
>  ...host-user-gpu-secondary.x86_64-latest.args |   3 +-
>  .../vhost-user-vga.x86_64-latest.args |   3 +-
>  15 files changed, 158 insertions(+), 16 deletions(-)
> 
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index 7d84fd8b5e..0de4fe4905 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf,
>  }
>  
>  
> +/**
> + * qemuTranlsatevCPUID:
> + *
> + * For given vCPU @id and vCPU topology (@cpu) compute corresponding
> + * @socket, @die, @core and @thread). This assumes linear topology,
> + * that is every [socket, die, core, thread] combination is valid vCPU
> + * ID and there are no 'holes'. This is ensured by
> + * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is
> + * set.
I wouldn't make this assumption, each machine can have (and has) it's own 
layout,
and now it's not hard to change that per machine version if necessary.

I'd suppose one could pull the list of possible CPUs from QEMU started
in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS
and then continue to configure numa with QMP commands using provided
CPUs layout.

How to present it to libvirt user I'm not sure (give them that list perhaps
and let select from it???)
But it's irrelevant, to the patch, magical IDs for socket/core/...whatever
should not be generated by libvirt anymore, but rather taken from QEMU for given
machine + -smp combination.

CCing Peter,
 as I vaguely recall him working on this issue  (preconfig + numa over QMP)


> + * Moreover, if @diesSupported is false (QEMU lacks
> + * QEMU_CAPS_SMP_DIES) then @die is set to zero and @socket is
> + * computed without taking numbed of dies into account.
> + *
> + * The algorithm is shamelessly copied over from QEMU's
> + * x86_topo_ids_from_idx() and its history (before introducing dies).
> + */
> +static void
> +qemuTranlsatevCPUID(unsigned int id,
> +bool diesSupported,
> +virCPUDefPtr cpu,
> +unsigned int *socket,
> +unsigned int *die,
> +unsigned int *core,
> +unsigned int *thread)
> +{
> +if (cpu && cpu->sockets) {
> +*thread = id % cpu->threads;
> +*core = id / cpu->threads % cpu->cores;
> +if (diesSupported) {
> +*die = id / (cpu->cores * cpu->threads) % cpu->dies;
> +*socket = id / (cpu->dies * cpu->cores * cpu->threads);
> +} else {
> +*die = 0;
> +*socket = id / (cpu->cores * cpu->threads) % cpu->sockets;
> +}
> +} else {
> +/* If no topology was provided, then qemuBuildSmpCommandLine()
> + * puts all vCPUs into a separate socket. */
> +*thread = 0;
> +*core = 0;
> +*die = 0;
> +*socket = id;
> +}
> +}
> +
> +
> +static void
> +qemuBuildNumaNewCPUs(virCommandPtr cmd,
> + virCPUDefPtr cpu,
> + virBitmapPtr cpumask,
> + size_t nodeid,
> + virQEMUCapsPtr qemuCap

Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types

2020-01-16 Thread Igor Mammedov

On Thu, 16 Jan 2020 13:06:28 +
Daniel P. Berrangé  wrote:

> On Thu, Jan 16, 2020 at 01:37:03PM +0100, Igor Mammedov wrote:
> > On Thu, 16 Jan 2020 11:42:09 +0100
> > Michal Privoznik  wrote:
> >   
> > > On 1/15/20 5:52 PM, Igor Mammedov wrote:  
> > > > On Wed, 15 Jan 2020 16:34:53 +0100
> > > > Peter Krempa  wrote:
> > > >     
> > > >> On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote:
> > > >>> Deprecation period is ran out and it's a time to flip the switch
> > > >>> introduced by cd5ff8333a.
> > > >>> Disable legacy option for new machine types and amend documentation.
> > > >>>
> > > >>> Signed-off-by: Igor Mammedov 
> > > >>> ---
> > > >>> CC: peter.mayd...@linaro.org
> > > >>> CC: ehabk...@redhat.com
> > > >>> CC: marcel.apfelb...@gmail.com
> > > >>> CC: m...@redhat.com
> > > >>> CC: pbonz...@redhat.com
> > > >>> CC: r...@twiddle.net
> > > >>> CC: da...@gibson.dropbear.id.au
> > > >>> CC: libvir-list@redhat.com
> > > >>> CC: qemu-...@nongnu.org
> > > >>> CC: qemu-...@nongnu.org
> > > >>> ---
> > > >>>   hw/arm/virt.c|  2 +-
> > > >>>   hw/core/numa.c   |  6 ++
> > > >>>   hw/i386/pc.c |  1 -
> > > >>>   hw/i386/pc_piix.c|  1 +
> > > >>>   hw/i386/pc_q35.c |  1 +
> > > >>>   hw/ppc/spapr.c   |  2 +-
> > > >>>   qemu-deprecated.texi | 16 
> > > >>>   qemu-options.hx  |  8 
> > > >>>   8 files changed, 14 insertions(+), 23 deletions(-)
> > > >>
> > > >> I'm afraid nobody bothered to fix it yet:
> > > >>
> > > >> https://bugzilla.redhat.com/show_bug.cgi?id=1783355
> > > > 
> > > > It's time to start working on it :)
> > > > (looks like just deprecating stuff isn't sufficient motivation,
> > > > maybe actual switch flipping would work out better)
> > > > 
> > > 
> > > So how was the upgrade from older to newer version resolved? I mean, if 
> > > the old qemu used -numa node,mem=XXX and it is migrated to a host with 
> > > newer qemu, the cmd line can't be switched to -numa node,memdev=node0, 
> > > can it? I'm asking because I've just started working on this.  
> > 
> > see commit cd5ff8333a3c87 for detailed info.
> > Short answer is it's not really resolved [*],
> > -numa node,mem will keep working on newer QEMU but only for old machine 
> > types
> > new machine types will accept only -numa node,memdev.
> > 
> > One can check if "mem=' is supported by using QAPI query-machines
> > and checking numa-mem-supported field. That field is flipped to false
> > for 5.0 and later machine types in this patch.  
> 
> Since libvirt droppped the ball here, can we postpone this change
> to machine types until a later release. 

Looks like we have to at this point.
We can do this for [82-86/86] patches which are mostly numa
related changes.

The rest could go in this release as it is in-depended of
numa, it mainly introduces memdev backend for main RAM
and consolidates twisted main RAM allocation logic.

> 
> Regards,
> Daniel

Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types

2020-01-16 Thread Igor Mammedov

On Thu, 16 Jan 2020 14:03:12 +0100
Michal Privoznik  wrote:

> On 1/16/20 1:37 PM, Igor Mammedov wrote:
> > On Thu, 16 Jan 2020 11:42:09 +0100
> > Michal Privoznik  wrote:
> >   
> >> On 1/15/20 5:52 PM, Igor Mammedov wrote:  
> >>> On Wed, 15 Jan 2020 16:34:53 +0100
> >>> Peter Krempa  wrote:
> >>>  
> >>>> On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote:  
> >>>>> Deprecation period is ran out and it's a time to flip the switch
> >>>>> introduced by cd5ff8333a.
> >>>>> Disable legacy option for new machine types and amend documentation.
> >>>>>
> >>>>> Signed-off-by: Igor Mammedov 
> >>>>> ---
> >>>>> CC: peter.mayd...@linaro.org
> >>>>> CC: ehabk...@redhat.com
> >>>>> CC: marcel.apfelb...@gmail.com
> >>>>> CC: m...@redhat.com
> >>>>> CC: pbonz...@redhat.com
> >>>>> CC: r...@twiddle.net
> >>>>> CC: da...@gibson.dropbear.id.au
> >>>>> CC: libvir-list@redhat.com
> >>>>> CC: qemu-...@nongnu.org
> >>>>> CC: qemu-...@nongnu.org
> >>>>> ---
> >>>>>hw/arm/virt.c|  2 +-
> >>>>>hw/core/numa.c   |  6 ++
> >>>>>hw/i386/pc.c |  1 -
> >>>>>hw/i386/pc_piix.c|  1 +
> >>>>>hw/i386/pc_q35.c |  1 +
> >>>>>hw/ppc/spapr.c   |  2 +-
> >>>>>qemu-deprecated.texi | 16 
> >>>>>qemu-options.hx  |  8 
> >>>>>8 files changed, 14 insertions(+), 23 deletions(-)  
> >>>>
> >>>> I'm afraid nobody bothered to fix it yet:
> >>>>
> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1783355  
> >>>
> >>> It's time to start working on it :)
> >>> (looks like just deprecating stuff isn't sufficient motivation,
> >>> maybe actual switch flipping would work out better)
> >>>  
> >>
> >> So how was the upgrade from older to newer version resolved? I mean, if
> >> the old qemu used -numa node,mem=XXX and it is migrated to a host with
> >> newer qemu, the cmd line can't be switched to -numa node,memdev=node0,
> >> can it? I'm asking because I've just started working on this.  
> > 
> > see commit cd5ff8333a3c87 for detailed info.
> > Short answer is it's not really resolved [*],
> > -numa node,mem will keep working on newer QEMU but only for old machine 
> > types
> > new machine types will accept only -numa node,memdev.
> > 
> > One can check if "mem=' is supported by using QAPI query-machines
> > and checking numa-mem-supported field. That field is flipped to false
> > for 5.0 and later machine types in this patch.  
> 
> Alright, so what we can do is the following:
> 
> 1) For new machine types (pc-5.0/q35-5.0 and newer) use memdev= always.
it's not only x86, it's for all machines that support numa
hence numa-mem-supported was introduced to make it easier for libvirt
to figure out when to use which syntax.

The plan was to release libvirt with support for numa-mem-supported and
then when newer QEMU forbids 'mem=' it change will be transparent for
relatively fresh livirt.

Whether it still does make sense though.

We could go with your suggestion in which case libvirt unilaterally
switches to using only 'memdev' for 5.0 machine types and then later
(5.1..) we release QEMU that enforces it.
In this case we can axe numa-mem-supported (I'd volunteer) to avoid
supporting yet another ABI/smart logic where your way could be sufficient.

Daniel,
what's your take on Michal's approach?

> 2) For older machine types, we are stuck with mem= until qemu is capable 
> of migrating from mem= to memdev=
> 
> I think this is a safe thing to do since migrating from one version of a 
> machine type to another is not supported (since it can change guest 
> ABI). And we will see how much 2) bothers us. Does this sound reasonable?\


> 
> Michal
> 
>

Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types

2020-01-16 Thread Igor Mammedov

On Thu, 16 Jan 2020 11:42:09 +0100
Michal Privoznik  wrote:

> On 1/15/20 5:52 PM, Igor Mammedov wrote:
> > On Wed, 15 Jan 2020 16:34:53 +0100
> > Peter Krempa  wrote:
> >   
> >> On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote:  
> >>> Deprecation period is ran out and it's a time to flip the switch
> >>> introduced by cd5ff8333a.
> >>> Disable legacy option for new machine types and amend documentation.
> >>>
> >>> Signed-off-by: Igor Mammedov 
> >>> ---
> >>> CC: peter.mayd...@linaro.org
> >>> CC: ehabk...@redhat.com
> >>> CC: marcel.apfelb...@gmail.com
> >>> CC: m...@redhat.com
> >>> CC: pbonz...@redhat.com
> >>> CC: r...@twiddle.net
> >>> CC: da...@gibson.dropbear.id.au
> >>> CC: libvir-list@redhat.com
> >>> CC: qemu-...@nongnu.org
> >>> CC: qemu-...@nongnu.org
> >>> ---
> >>>   hw/arm/virt.c|  2 +-
> >>>   hw/core/numa.c   |  6 ++
> >>>   hw/i386/pc.c |  1 -
> >>>   hw/i386/pc_piix.c|  1 +
> >>>   hw/i386/pc_q35.c |  1 +
> >>>   hw/ppc/spapr.c   |  2 +-
> >>>   qemu-deprecated.texi | 16 
> >>>   qemu-options.hx  |  8 
> >>>   8 files changed, 14 insertions(+), 23 deletions(-)  
> >>
> >> I'm afraid nobody bothered to fix it yet:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1783355  
> > 
> > It's time to start working on it :)
> > (looks like just deprecating stuff isn't sufficient motivation,
> > maybe actual switch flipping would work out better)
> >   
> 
> So how was the upgrade from older to newer version resolved? I mean, if 
> the old qemu used -numa node,mem=XXX and it is migrated to a host with 
> newer qemu, the cmd line can't be switched to -numa node,memdev=node0, 
> can it? I'm asking because I've just started working on this.

see commit cd5ff8333a3c87 for detailed info.
Short answer is it's not really resolved [*],
-numa node,mem will keep working on newer QEMU but only for old machine types
new machine types will accept only -numa node,memdev.

One can check if "mem=' is supported by using QAPI query-machines
and checking numa-mem-supported field. That field is flipped to false
for 5.0 and later machine types in this patch.


*) I might give another try to removing 'mem' completely in migration
compatible manner but that's well beyond the scope of this series
So far I hasn't been able to convince myself that previous attempts
to do it were absolutely correct for all corner cases that are there.

> Michal

Re: [libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types

2020-01-15 Thread Igor Mammedov

On Wed, 15 Jan 2020 16:34:53 +0100
Peter Krempa  wrote:

> On Wed, Jan 15, 2020 at 16:07:37 +0100, Igor Mammedov wrote:
> > Deprecation period is ran out and it's a time to flip the switch
> > introduced by cd5ff8333a.
> > Disable legacy option for new machine types and amend documentation.
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> > CC: peter.mayd...@linaro.org
> > CC: ehabk...@redhat.com
> > CC: marcel.apfelb...@gmail.com
> > CC: m...@redhat.com
> > CC: pbonz...@redhat.com
> > CC: r...@twiddle.net
> > CC: da...@gibson.dropbear.id.au
> > CC: libvir-list@redhat.com
> > CC: qemu-...@nongnu.org
> > CC: qemu-...@nongnu.org
> > ---
> >  hw/arm/virt.c|  2 +-
> >  hw/core/numa.c   |  6 ++
> >  hw/i386/pc.c |  1 -
> >  hw/i386/pc_piix.c|  1 +
> >  hw/i386/pc_q35.c |  1 +
> >  hw/ppc/spapr.c   |  2 +-
> >  qemu-deprecated.texi | 16 
> >  qemu-options.hx  |  8 
> >  8 files changed, 14 insertions(+), 23 deletions(-)  
> 
> I'm afraid nobody bothered to fix it yet:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1783355

It's time to start working on it :)
(looks like just deprecating stuff isn't sufficient motivation,
maybe actual switch flipping would work out better)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v2 86/86] numa: remove deprecated implicit RAM distribution between nodes

2020-01-15 Thread Igor Mammedov

Feature has been deprecated since 4.1 (4bb4a273),
remove it.

As result if RAM distribution wasn't specified explicitly,
the machine won't start and CLI should be changed to explicitly
assign RAM to nodes using options:
  -node node,memdev  (5.0 and newer machine types)
  -node node,mem (4.2 and older machine types)
It's recommended to use "memdev" variant for new virtual machines
and use "mem" only when it's necessary to migrate already existing
virtual machine started with implicit RAM distribution.

Signed-off-by: Igor Mammedov 
---
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: da...@gibson.dropbear.id.au
CC: libvir-list@redhat.com
CC: qemu-...@nongnu.org
CC: r...@twiddle.net
---
 include/hw/boards.h   |  3 ---
 include/sysemu/numa.h |  4 
 hw/core/machine.c |  6 -
 hw/core/numa.c| 61 +--
 hw/i386/pc_piix.c |  1 -
 hw/i386/pc_q35.c  |  1 -
 hw/ppc/spapr.c|  7 --
 qemu-deprecated.texi  |  8 ---
 qemu-options.hx   | 16 +++---
 9 files changed, 13 insertions(+), 94 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 7f09bc9..916bb50 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -192,12 +192,9 @@ struct MachineClass {
 int minimum_page_bits;
 bool has_hotpluggable_cpus;
 bool ignore_memory_transaction_failures;
-int numa_mem_align_shift;
 const char **valid_cpu_types;
 strList *allowed_dynamic_sysbus_devices;
 bool auto_enable_numa_with_memhp;
-void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
- int nb_nodes, ram_addr_t size);
 bool ignore_boot_device_suffixes;
 bool smbus_no_migration_support;
 bool nvdimm_supported;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index ad58ee8..4173ef2 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -106,10 +106,6 @@ void parse_numa_hmat_cache(MachineState *ms, 
NumaHmatCacheOptions *node,
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
-void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
- int nb_nodes, ram_addr_t size);
-void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
-  int nb_nodes, ram_addr_t size);
 void numa_cpu_pre_plug(const struct CPUArchId *slot, DeviceState *dev,
Error **errp);
 bool numa_uses_legacy_mem(void);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index d8fa45c..0862f45 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -747,12 +747,6 @@ static void machine_class_init(ObjectClass *oc, void *data)
 mc->rom_file_has_mr = true;
 mc->smp_parse = smp_parse;
 
-/* numa node memory size aligned on 8MB by default.
- * On Linux, each node's border has to be 8MB aligned
- */
-mc->numa_mem_align_shift = 23;
-mc->numa_auto_assign_ram = numa_default_auto_assign_ram;
-
 object_class_property_add_str(oc, "kernel",
 machine_get_kernel, machine_set_kernel, &error_abort);
 object_class_property_set_description(oc, "kernel",
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 47d5ea1..591e62a 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -627,42 +627,6 @@ static void complete_init_numa_distance(MachineState *ms)
 }
 }
 
-void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
- int nb_nodes, ram_addr_t size)
-{
-int i;
-uint64_t usedmem = 0;
-
-/* Align each node according to the alignment
- * requirements of the machine class
- */
-
-for (i = 0; i < nb_nodes - 1; i++) {
-nodes[i].node_mem = (size / nb_nodes) &
-~((1 << mc->numa_mem_align_shift) - 1);
-usedmem += nodes[i].node_mem;
-}
-nodes[i].node_mem = size - usedmem;
-}
-
-void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
-  int nb_nodes, ram_addr_t size)
-{
-int i;
-uint64_t usedmem = 0, node_mem;
-uint64_t granularity = size / nb_nodes;
-uint64_t propagate = 0;
-
-for (i = 0; i < nb_nodes - 1; i++) {
-node_mem = (granularity + propagate) &
-   ~((1 << mc->numa_mem_align_shift) - 1);
-propagate = granularity + propagate - node_mem;
-nodes[i].node_mem = node_mem;
-usedmem += node_mem;
-}
-nodes[i].node_mem = size - usedmem;
-}
-
 static void numa_init_memdev_container(MachineState *ms, MemoryRegion *ram)
 {
 int i;
@@ -732,30 +696,15 @@ void numa_complete_configuration(MachineState *ms)
 ms->numa_state->num_nodes = MAX_NODES;

[libvirt] [PATCH v2 82/86] numa: forbid '-numa node, mem' for 5.0 and newer machine types

2020-01-15 Thread Igor Mammedov

Deprecation period is ran out and it's a time to flip the switch
introduced by cd5ff8333a.
Disable legacy option for new machine types and amend documentation.

Signed-off-by: Igor Mammedov 
---
CC: peter.mayd...@linaro.org
CC: ehabk...@redhat.com
CC: marcel.apfelb...@gmail.com
CC: m...@redhat.com
CC: pbonz...@redhat.com
CC: r...@twiddle.net
CC: da...@gibson.dropbear.id.au
CC: libvir-list@redhat.com
CC: qemu-...@nongnu.org
CC: qemu-...@nongnu.org
---
 hw/arm/virt.c|  2 +-
 hw/core/numa.c   |  6 ++
 hw/i386/pc.c |  1 -
 hw/i386/pc_piix.c|  1 +
 hw/i386/pc_q35.c |  1 +
 hw/ppc/spapr.c   |  2 +-
 qemu-deprecated.texi | 16 
 qemu-options.hx  |  8 
 8 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e2fbca3..49de0d8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2049,7 +2049,6 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 hc->pre_plug = virt_machine_device_pre_plug_cb;
 hc->plug = virt_machine_device_plug_cb;
 hc->unplug_request = virt_machine_device_unplug_request_cb;
-mc->numa_mem_supported = true;
 mc->auto_enable_numa_with_memhp = true;
 mc->default_ram_id = "mach-virt.ram";
 }
@@ -2153,6 +2152,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 0)
 static void virt_machine_4_2_options(MachineClass *mc)
 {
 compat_props_add(mc->compat_props, hw_compat_4_2, hw_compat_4_2_len);
+mc->numa_mem_supported = true;
 }
 DEFINE_VIRT_MACHINE(4, 2)
 
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 0970a30..3177066 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -117,6 +117,12 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 }
 
 if (node->has_mem) {
+if (!mc->numa_mem_supported) {
+error_setg(errp, "Parameter -numa node,mem is not supported by 
this"
+  " machine type. Use -numa node,memdev instead");
+return;
+}
+
 numa_info[nodenr].node_mem = node->mem;
 if (!qtest_enabled()) {
 warn_report("Parameter -numa node,mem is deprecated,"
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 21b8290..fa8d024 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1947,7 +1947,6 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 hc->unplug = pc_machine_device_unplug_cb;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
-mc->numa_mem_supported = true;
 mc->default_ram_id = "pc.ram";
 
 object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index fa12203..0a9b9e0 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -435,6 +435,7 @@ static void pc_i440fx_4_2_machine_options(MachineClass *m)
 pc_i440fx_5_0_machine_options(m);
 m->alias = NULL;
 m->is_default = 0;
+m->numa_mem_supported = true;
 compat_props_add(m->compat_props, hw_compat_4_2, hw_compat_4_2_len);
 compat_props_add(m->compat_props, pc_compat_4_2, pc_compat_4_2_len);
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 84cf925..4d6e2be 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -363,6 +363,7 @@ static void pc_q35_4_2_machine_options(MachineClass *m)
 {
 pc_q35_5_0_machine_options(m);
 m->alias = NULL;
+m->numa_mem_supported = true;
 compat_props_add(m->compat_props, hw_compat_4_2, hw_compat_4_2_len);
 compat_props_add(m->compat_props, pc_compat_4_2, pc_compat_4_2_len);
 }
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bcbe1f1..2686b73 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4383,7 +4383,6 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  * in which LMBs are represented and hot-added
  */
 mc->numa_mem_align_shift = 28;
-mc->numa_mem_supported = true;
 mc->auto_enable_numa = true;
 
 smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
@@ -4465,6 +4464,7 @@ static void spapr_machine_4_2_class_options(MachineClass 
*mc)
 {
 spapr_machine_5_0_class_options(mc);
 compat_props_add(mc->compat_props, hw_compat_4_2, hw_compat_4_2_len);
+mc->numa_mem_supported = true;
 }
 
 DEFINE_SPAPR_MACHINE(4_2, "4.2", false);
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 982af95..17a0e1d 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -89,22 +89,6 @@ error in the future.
 The @code{-realtime mlock=on|off} argument has been replaced by the
 @code{-overcommit mem-lock=on|off} argument.
 
-@subsection -numa node,mem=@var{size} (since 4.1)
-
-The parameter @option{mem} of @option{-numa node} is used to assign a part of
-guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
-RAM chunk on the host side (like bind it to

Re: [libvirt] [PATCH 2/2] Add -mem-shared option

2019-12-10 Thread Igor Mammedov

On Tue, 10 Dec 2019 11:34:32 +0100
Markus Armbruster  wrote:

> Eduardo Habkost  writes:
> 
> > +Markus
> >
> > On Tue, Dec 03, 2019 at 03:43:03PM +0100, Igor Mammedov wrote:  
> >> On Tue, 3 Dec 2019 09:56:15 +0100
> >> Thomas Huth  wrote:
> >>   
> >> > On 02/12/2019 22.00, Eduardo Habkost wrote:  
> >> > > On Mon, Dec 02, 2019 at 08:39:48AM +0100, Igor Mammedov wrote:
> >> > >> On Fri, 29 Nov 2019 18:46:12 +0100
> >> > >> Paolo Bonzini  wrote:
> >> > >>
> >> > >>> On 29/11/19 13:16, Igor Mammedov wrote:
> >> > >>>> As for "-m", I'd make it just an alias that translates
> >> > >>>>  -m/mem-path/mem-prealloc  
> >> > >>>
> >> > >>> I think we should just deprecate -mem-path/-mem-prealloc in 5.0.  
> >> > >>> CCing
> >> > >>> Thomas as mister deprecation. :)
> >> > >>
> >> > >> I'll add that to my series
> >> > > 
> >> > > Considering that the plan is to eventually reimplement those
> >> > > options as syntactic sugar for memory backend options (hopefully
> >> > > in less than 2 QEMU releases), what's the point of deprecating
> >> > > them?
> >> > 
> >> > Well, it depends on the "classification" [1] of the parameter...
> >> > 
> >> > Let's ask: What's the main purpose of the option?
> >> > 
> >> > Is it easier to use than the "full" option, and thus likely to be used
> >> > by a lot of people who run QEMU directly from the CLI? In that case it
> >> > should stay as "convenience option" and not be deprecated.
> >> > 
> >> > Or is the option merely there to give the upper layers like libvirt or
> >> > some few users and their scripts some more grace period to adapt their
> >> > code, but we all agree that the options are rather ugly and should
> >> > finally go away? Then it's rather a "legacy option" and the deprecation
> >> > process is the right way to go. Our QEMU interface is still way 
> >> > overcrowded, we should try to keep it as clean as possible.  
> >> 
> >> After switching to memdev for main RAM, users could use relatively
> >> short global options
> >>  -global memory-backend.prealloc|share=on
> >> and
> >>  -global memory-backend-file.mem-path=X|prealloc|share=on
> >> 
> >> instead of us adding and maintaining slightly shorter
> >>  -mem-shared/-mem-path/-mem-prealloc  
> >
> > Global properties are a convenient way to expose knobs through
> > the command line with little effort, but we have no documentation
> > on which QOM properties are really supposed to be touched by
> > users using -global.
> >
> > Unless we fix the lack of documentation, I'd prefer to have
> > syntactic sugar translated to -global instead of recommending
> > direct usage of -global.  
> 
> Fair point.
> 
> I'd take QOM property documentation over still more sugar.
> 
> Sometimes, the practical way to make simple things simple is sugar.  I
> can accept that.  This doesn't look like such a case, though.
I can document concrete globals as replacement at the place
-mem-path/-mem-prealloc are documented during deprecation and
then in 2 releases we will just drop legacy syntax and keep only
globals over there.
(eventually it will spread various globals
over man page, which I don't like but we probably should start
somwhere and consolidate later if globals in man page become
normal practice.)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 2/2] Add -mem-shared option

2019-12-03 Thread Igor Mammedov

On Tue, 3 Dec 2019 09:56:15 +0100
Thomas Huth  wrote:

> On 02/12/2019 22.00, Eduardo Habkost wrote:
> > On Mon, Dec 02, 2019 at 08:39:48AM +0100, Igor Mammedov wrote:  
> >> On Fri, 29 Nov 2019 18:46:12 +0100
> >> Paolo Bonzini  wrote:
> >>  
> >>> On 29/11/19 13:16, Igor Mammedov wrote:  
> >>>> As for "-m", I'd make it just an alias that translates
> >>>>  -m/mem-path/mem-prealloc
> >>>
> >>> I think we should just deprecate -mem-path/-mem-prealloc in 5.0.  CCing
> >>> Thomas as mister deprecation. :)  
> >>
> >> I'll add that to my series  
> > 
> > Considering that the plan is to eventually reimplement those
> > options as syntactic sugar for memory backend options (hopefully
> > in less than 2 QEMU releases), what's the point of deprecating
> > them?  
> 
> Well, it depends on the "classification" [1] of the parameter...
> 
> Let's ask: What's the main purpose of the option?
> 
> Is it easier to use than the "full" option, and thus likely to be used
> by a lot of people who run QEMU directly from the CLI? In that case it
> should stay as "convenience option" and not be deprecated.
> 
> Or is the option merely there to give the upper layers like libvirt or
> some few users and their scripts some more grace period to adapt their
> code, but we all agree that the options are rather ugly and should
> finally go away? Then it's rather a "legacy option" and the deprecation
> process is the right way to go. Our QEMU interface is still way 
> overcrowded, we should try to keep it as clean as possible.

After switching to memdev for main RAM, users could use relatively
short global options
 -global memory-backend.prealloc|share=on
and
 -global memory-backend-file.mem-path=X|prealloc|share=on

instead of us adding and maintaining slightly shorter
 -mem-shared/-mem-path/-mem-prealloc

>  Thomas
> 
> 
> [1] Using the terms from:
> https://www.youtube.com/watch?v=Oscjpkns7tM&t=8m

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v2] deprecate -mem-path fallback to anonymous RAM

2019-06-25 Thread Igor Mammedov

Fallback might affect guest or worse whole host performance
or functionality if backing file were used to share guest RAM
with another process.

Patch deprecates fallback so that we could remove it in future
and ensure that QEMU will provide expected behavior and fail if
it can't use user provided backing file.

Signed-off-by: Igor Mammedov 
---
v2:
 * improve text language
(Markus Armbruster )

 numa.c   | 6 --
 qemu-deprecated.texi | 9 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/numa.c b/numa.c
index 91a29138a2..c15e53e92d 100644
--- a/numa.c
+++ b/numa.c
@@ -494,8 +494,10 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
*mr, Object *owner,
 if (mem_prealloc) {
 exit(1);
 }
-error_report("falling back to regular RAM allocation.");
-
+warn_report("falling back to regular RAM allocation");
+error_printf("This is deprecated. Make sure that -mem-path "
+ " specified path has sufficient resources to allocate"
+ " -m specified RAM amount or QEMU will fail to 
start");
 /* Legacy behavior: if allocation failed, fall back to
  * regular RAM allocation.
  */
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 2fe9b72121..1b7f3b10dc 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -112,6 +112,15 @@ QEMU using implicit generic or board specific splitting 
rule.
 Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if
 it's supported by used machine type) to define mapping explictly instead.
 
+@subsection -mem-path fallback to RAM (since 4.1)
+Currently if guest RAM allocation from file pointed by @option{mem-path}
+fails, QEMU falls back to allocating from RAM, which might result
+in unpredictable behavior since the backing file specified by the user
+is ignored. In the future, users will be responsible for making sure
+the backing storage specified with @option{-mem-path} can actually provide
+the guest RAM configured with @option{-m} and fail to start up if RAM 
allocation
+is unsuccessful.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
-- 
2.18.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH] deprecate -mem-path fallback to anonymous RAM

2019-06-25 Thread Igor Mammedov

On Mon, 24 Jun 2019 16:01:49 +0200
Markus Armbruster  wrote:

> Igor Mammedov  writes:
> 
> > On Mon, 24 Jun 2019 10:17:33 +0200
> > Markus Armbruster  wrote:
> >  
> >> Igor Mammedov  writes:
> >>   
> >> > Fallback might affect guest or worse whole host performance
> >> > or functionality if backing file were used to share guest RAM
> >> > with another process.
> >> >
> >> > Patch deprecates fallback so that we could remove it in future
> >> > and ensure that QEMU will provide expected behavior and fail if
> >> > it can't use user provided backing file.
> >> >
> >> > Signed-off-by: Igor Mammedov 
> >> > ---
> >> > PS:
> >> > Patch is written on top of
> >> >   [PATCH v4 0/3] numa: deprecate '-numa node,  mem' and default memory 
> >> > distribution
> >> > to avoid conflicts in qemu-deprecated.texi
> >> >
> >> >  numa.c   | 4 ++--
> >> >  qemu-deprecated.texi | 8 
> >> >  2 files changed, 10 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/numa.c b/numa.c
> >> > index 91a29138a2..53d67b8ad9 100644
> >> > --- a/numa.c
> >> > +++ b/numa.c
> >> > @@ -494,8 +494,8 @@ static void 
> >> > allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> >> >  if (mem_prealloc) {
> >> >  exit(1);
> >> >  }
> >> > -error_report("falling back to regular RAM allocation.");
> >> > -
> >> > +warn_report("falling back to regular RAM allocation. "
> >> > +"Fallback to RAM allocation is deprecated.");   
> >> >  
> >> 
> >> Can we give the user clues on how to avoid the deprecated fallback?  
> >
> > I've intentionally left it out for a lack of clear enough advise.
> > Something like:
> >   "Make sure that host has resources to map file pointed by -mem-path"
> > would be pretty useless.  
> 
> I see.
> 
> > I think describing how host should be configured in various ways
> > depending on type of backing storage is well out of scope of any
> > QEMU documentation. But if you have an idea to what to put there
> > (or what to put in deprecation doc and refer to from here),
> > I'll add it on respin.
> >  
> >> Warning message nitpick: the message should be a single phrase, with no
> >> newline or trailing punctuation.  Suggest something like
> >> 
> >>warn_report("falling back to regular RAM allocation");
> >>error_printf("This is deprecated.   >> "to do goes here>\n");
> >>   
> >> >  /* Legacy behavior: if allocation failed, fall back to
> >> >   * regular RAM allocation.
> >> >   */
> >> > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> >> > index 2fe9b72121..2193705644 100644
> >> > --- a/qemu-deprecated.texi
> >> > +++ b/qemu-deprecated.texi
> >> > @@ -112,6 +112,14 @@ QEMU using implicit generic or board specific 
> >> > splitting rule.
> >> >  Use @option{memdev} with @var{memory-backend-ram} backend or 
> >> > @option{mem} (if
> >> >  it's supported by used machine type) to define mapping explictly 
> >> > instead.
> >> >  
> >> > +@subsection -mem-path fallback to RAM (since 4.1)
> >> > +Currently if system memory allocation from file pointed by 
> >> > @option{mem-path}
> >> > +fails, QEMU fallbacks to allocating from anonymous RAM. Which might 
> >> > result
> >> > +in unpredictable behavior since provided backing file wasn't used.
> >> 
> >> 
> >> Noch such verb "to fallback", obvious fix "QEMU falls back to"
> >> 
> >> Suggest "RAM, which might".
> >> 
> >> Better: "since the backing file specified by the user is ignored".
> >>   
> >> > In 
> >> > future
> >> > +QEMU will not fallback and fail to start up, so user could fix his/her 
> >> > QEMU/host
> >> > +configuration or explicitly use -m without -mem-path if system memo

Re: [libvirt] [Qemu-devel] [PATCH] deprecate -mem-path fallback to anonymous RAM

2019-06-24 Thread Igor Mammedov

On Mon, 24 Jun 2019 10:36:55 +0100
Daniel P. Berrangé  wrote:

> On Mon, Jun 24, 2019 at 10:17:33AM +0200, Markus Armbruster wrote:
> > Igor Mammedov  writes:
> >   
> > > Fallback might affect guest or worse whole host performance
> > > or functionality if backing file were used to share guest RAM
> > > with another process.
> > >
> > > Patch deprecates fallback so that we could remove it in future
> > > and ensure that QEMU will provide expected behavior and fail if
> > > it can't use user provided backing file.
> > >
> > > Signed-off-by: Igor Mammedov 
> > > ---
> > > PS:
> > > Patch is written on top of
> > >   [PATCH v4 0/3] numa: deprecate '-numa node,  mem' and default memory 
> > > distribution
> > > to avoid conflicts in qemu-deprecated.texi
> > >
> > >  numa.c   | 4 ++--
> > >  qemu-deprecated.texi | 8 
> > >  2 files changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/numa.c b/numa.c
> > > index 91a29138a2..53d67b8ad9 100644
> > > --- a/numa.c
> > > +++ b/numa.c
> > > @@ -494,8 +494,8 @@ static void 
> > > allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> > >  if (mem_prealloc) {
> > >  exit(1);
> > >  }
> > > -error_report("falling back to regular RAM allocation.");
> > > -
> > > +warn_report("falling back to regular RAM allocation. "
> > > +"Fallback to RAM allocation is deprecated.");  
> > 
> > Can we give the user clues on how to avoid the deprecated fallback?  
> 
> There's nothing a user can do aside from ensuring they have sufficient
> free memory before launching QEMU to satisfy the huge pag request.
> 
> Probably just needs changing to do.
> 
> "This is deprecated, future QEMU releases will exit when
>  huge pages cannot be allocated"

Also it could be that users might use other than hugepages backing
storage, that's why I completely left concrete advice out from
suggestion.

User should know what he/she is doing when providing mem-path,
if user supplies mis-configured path QEMU will print error
from memory-backend-file if/when allocation fails.


> Regards,
> Daniel


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH] deprecate -mem-path fallback to anonymous RAM

2019-06-24 Thread Igor Mammedov

On Mon, 24 Jun 2019 10:17:33 +0200
Markus Armbruster  wrote:

> Igor Mammedov  writes:
> 
> > Fallback might affect guest or worse whole host performance
> > or functionality if backing file were used to share guest RAM
> > with another process.
> >
> > Patch deprecates fallback so that we could remove it in future
> > and ensure that QEMU will provide expected behavior and fail if
> > it can't use user provided backing file.
> >
> > Signed-off-by: Igor Mammedov 
> > ---
> > PS:
> > Patch is written on top of
> >   [PATCH v4 0/3] numa: deprecate '-numa node,  mem' and default memory 
> > distribution
> > to avoid conflicts in qemu-deprecated.texi
> >
> >  numa.c   | 4 ++--
> >  qemu-deprecated.texi | 8 
> >  2 files changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/numa.c b/numa.c
> > index 91a29138a2..53d67b8ad9 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -494,8 +494,8 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
> > *mr, Object *owner,
> >  if (mem_prealloc) {
> >  exit(1);
> >  }
> > -error_report("falling back to regular RAM allocation.");
> > -
> > +warn_report("falling back to regular RAM allocation. "
> > +"Fallback to RAM allocation is deprecated.");  
> 
> Can we give the user clues on how to avoid the deprecated fallback?

I've intentionally left it out for a lack of clear enough advise.
Something like:
  "Make sure that host has resources to map file pointed by -mem-path"
would be pretty useless.
I think describing how host should be configured in various ways
depending on type of backing storage is well out of scope of any
QEMU documentation. But if you have an idea to what to put there
(or what to put in deprecation doc and refer to from here),
I'll add it on respin.
 
> Warning message nitpick: the message should be a single phrase, with no
> newline or trailing punctuation.  Suggest something like
> 
>warn_report("falling back to regular RAM allocation");
>error_printf("This is deprecated.   "to do goes here>\n");
> 
> >  /* Legacy behavior: if allocation failed, fall back to
> >   * regular RAM allocation.
> >   */
> > diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> > index 2fe9b72121..2193705644 100644
> > --- a/qemu-deprecated.texi
> > +++ b/qemu-deprecated.texi
> > @@ -112,6 +112,14 @@ QEMU using implicit generic or board specific 
> > splitting rule.
> >  Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} 
> > (if
> >  it's supported by used machine type) to define mapping explictly instead.
> >  
> > +@subsection -mem-path fallback to RAM (since 4.1)
> > +Currently if system memory allocation from file pointed by 
> > @option{mem-path}
> > +fails, QEMU fallbacks to allocating from anonymous RAM. Which might result
> > +in unpredictable behavior since provided backing file wasn't used.  
> 
> 
> Noch such verb "to fallback", obvious fix "QEMU falls back to"
> 
> Suggest "RAM, which might".
> 
> Better: "since the backing file specified by the user is ignored".
> 
> > In 
> > future
> > +QEMU will not fallback and fail to start up, so user could fix his/her 
> > QEMU/host
> > +configuration or explicitly use -m without -mem-path if system memory 
> > allocated
> > +from anonymous RAM suits usecase.  
> 
> What's "system memory allocation"?
Using man page language, would be
 'guest startup RAM size'
acceptable?


> Perhaps: "In the future, QEMU will not fall back, but fail instead.
> Adjust either the host configuration [FIXME how?] or the QEMU
> configuration [FIXME how?]."

Maybe
"
 In the future, QEMU will not fall back, but fail instead.
 Adjust either the QEMU configuration by removing @option{-mem-path} so
 QEMU will use only anonymous or host configuration to make sure that
 there are sufficient resources on backing storage pointed by -mem-path
 to allocate amount specified by @option{-m}.
"
 
> > +
> >  @section QEMU Machine Protocol (QMP) commands
> >  
> >  @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)  
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH] deprecate -mem-path fallback to anonymous RAM

2019-06-20 Thread Igor Mammedov

Fallback might affect guest or worse whole host performance
or functionality if backing file were used to share guest RAM
with another process.

Patch deprecates fallback so that we could remove it in future
and ensure that QEMU will provide expected behavior and fail if
it can't use user provided backing file.

Signed-off-by: Igor Mammedov 
---
PS:
Patch is written on top of
  [PATCH v4 0/3] numa: deprecate '-numa node,  mem' and default memory 
distribution
to avoid conflicts in qemu-deprecated.texi

 numa.c   | 4 ++--
 qemu-deprecated.texi | 8 
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/numa.c b/numa.c
index 91a29138a2..53d67b8ad9 100644
--- a/numa.c
+++ b/numa.c
@@ -494,8 +494,8 @@ static void allocate_system_memory_nonnuma(MemoryRegion 
*mr, Object *owner,
 if (mem_prealloc) {
 exit(1);
 }
-error_report("falling back to regular RAM allocation.");
-
+warn_report("falling back to regular RAM allocation. "
+"Fallback to RAM allocation is deprecated.");
 /* Legacy behavior: if allocation failed, fall back to
  * regular RAM allocation.
  */
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 2fe9b72121..2193705644 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -112,6 +112,14 @@ QEMU using implicit generic or board specific splitting 
rule.
 Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if
 it's supported by used machine type) to define mapping explictly instead.
 
+@subsection -mem-path fallback to RAM (since 4.1)
+Currently if system memory allocation from file pointed by @option{mem-path}
+fails, QEMU fallbacks to allocating from anonymous RAM. Which might result
+in unpredictable behavior since provided backing file wasn't used. In future
+QEMU will not fallback and fail to start up, so user could fix his/her 
QEMU/host
+configuration or explicitly use -m without -mem-path if system memory allocated
+from anonymous RAM suits usecase.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
-- 
2.18.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory distribution

2019-06-17 Thread Igor Mammedov

On Thu, 30 May 2019 10:33:16 +0200
Igor Mammedov  wrote:

> Changes since v3:
>   - simplify series by dropping idea of showing property values in 
> "qom-list-properties"
> and use MachineInfo in QAPI schema instead
> 
> Changes since v2:
>   - taking in account previous review, implement a way for mgmt to intospect 
> if
> '-numa node,mem' is supported by machine type as suggested by Daniel at
>  https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html
>   * ammend "qom-list-properties" to show property values
>   * add "numa-mem-supported" machine property to reflect if '-numa 
> node,mem=SZ'
> is supported. It culd be used with '-machine none' or at runtime with
> --preconfig before numa memory mapping are configured
>   * minor fixes to deprecation documentation mentioning "numa-mem-supported" 
> property
> 
>  1) "I'm considering to deprecating -mem-path/prealloc CLI options and 
> replacing
> them with a single memdev Machine property to allow interested users to pick
> used backend for initial RAM (fixes mixed -mem-path+hostmem backends issues)
> and as a transition step to modeling initial RAM as a Device instead of
> (ab)using MemoryRegion APIs."
> (for more details see: 
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg596314.html)
> 
> However there is a couple of roadblocks on the way (s390x and numa memory 
> handling).
> I think I finally thought out a way to hack s390x in migration compatible 
> manner,
> but I don't see any way to do it for -numa node,mem and default RAM 
> assignement
> to nodes. Considering both numa usecases aren't meaningfully using NUMA (aside
> guest side testing) and could be replaced with explicitly used memdev 
> parameter,
> I'd like to propose removing these fake NUMA friends on new machine types,
> hence this deprecation. And once the last machie type that supported the 
> option
> is removed we would be able to remove option altogether.
> 
> As result of removing deprecated options and replacing initial RAM allocation
> with 'memdev's (1), QEMU will allocate guest RAM in consistent way, fixing 
> mixed
> use-case and allowing boards to move towards modelling initial RAM as 
> Device(s).
> Which in its own turn should allow to cleanup NUMA/HMP/memory accounting code
> more by dropping ad-hoc node_mem tracking and reusing memory device 
> enumeration
> instead.

Eduardo,

could you take and merge it via numa/machine tree?

> 
> Reference to previous versions:
>  * https://www.mail-archive.com/qemu-devel@nongnu.org/msg617694.html
> 
> CC: libvir-list@redhat.com
> CC: ehabk...@redhat.com
> CC: pbonz...@redhat.com
> CC: berra...@redhat.com
> CC: arm...@redhat.com
> 
> Igor Mammedov (3):
>   machine: show if CLI option '-numa node,mem' is supported in QAPI
> schema
>   numa: deprecate 'mem' parameter of '-numa node' option
>   numa: deprecate implict memory distribution between nodes
> 
>  include/hw/boards.h  |  3 +++
>  hw/arm/virt.c|  1 +
>  hw/i386/pc.c |  1 +
>  hw/ppc/spapr.c   |  1 +
>  numa.c   |  5 +
>  qapi/misc.json   |  5 -
>  qemu-deprecated.texi | 24 
>  vl.c |  1 +
>  8 files changed, 40 insertions(+), 1 deletion(-)
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v5 1/3] machine: show if CLI option '-numa node, mem' is supported in QAPI schema

2019-06-10 Thread Igor Mammedov

Legacy '-numa node,mem' option has a number of issues and mgmt often
defaults to it. Unfortunately it's no possible to replace it with
an alternative '-numa memdev' without breaking migration compatibility.
What's possible though is to deprecate it, keeping option working with
old machine types only.

In order to help users to find out if being deprecated CLI option
'-numa node,mem' is still supported by particular machine type, add new
"numa-mem-supported" property to output of query-machines.

"numa-mem-supported" is set to 'true' for machines that currently support
NUMA, but it will be flipped to 'false' later on, once deprecation period
expires and kept 'true' only for old machine types that used to support
the legacy option so it won't break existing configuration that are using
it.

Signed-off-by: Igor Mammedov 
---
  v5:
 (Markus Armbruster )
* s/by machine type/by the machine type/
* ammend commit message
   s/to MachineInfo description in QAPI schema/to output of 
query-machines/
  v4:
  * drop idea to use "qom-list-properties" and use MachineInfo instead
which could be inspected with 'query-machines'

 include/hw/boards.h | 3 +++
 hw/arm/virt.c   | 1 +
 hw/i386/pc.c| 1 +
 hw/ppc/spapr.c  | 1 +
 qapi/misc.json  | 5 -
 vl.c| 1 +
 6 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 6ff02bf..ab6badc 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -158,6 +158,8 @@ typedef struct {
  * @kvm_type:
  *Return the type of KVM corresponding to the kvm-type string option or
  *computed based on other criteria such as the host kernel capabilities.
+ * @numa_mem_supported:
+ *true if '--numa node.mem' option is supported and false otherwise
  */
 struct MachineClass {
 /*< private >*/
@@ -210,6 +212,7 @@ struct MachineClass {
 bool ignore_boot_device_suffixes;
 bool smbus_no_migration_support;
 bool nvdimm_supported;
+bool numa_mem_supported;
 
 HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
DeviceState *dev);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index bf54f10..481a603 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
 hc->plug = virt_machine_device_plug_cb;
+mc->numa_mem_supported = true;
 }
 
 static void virt_instance_init(Object *obj)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index edc240b..25146d7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2750,6 +2750,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 nc->nmi_monitor_handler = x86_nmi;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
+mc->numa_mem_supported = true;
 
 object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
 pc_machine_get_device_memory_region_size, NULL,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e2b33e5..89d5814 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4340,6 +4340,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  * in which LMBs are represented and hot-added
  */
 mc->numa_mem_align_shift = 28;
+mc->numa_mem_supported = true;
 
 smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
 smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON;
diff --git a/qapi/misc.json b/qapi/misc.json
index 8b3ca4f..2dbfdf0 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -2018,12 +2018,15 @@
 #
 # @hotpluggable-cpus: cpu hotplug via -device is supported (since 2.7.0)
 #
+# @numa-mem-supported: true if '-numa node,mem' option is supported by
+#  the machine type and false otherwise (since 4.1)
+#
 # Since: 1.2.0
 ##
 { 'struct': 'MachineInfo',
   'data': { 'name': 'str', '*alias': 'str',
 '*is-default': 'bool', 'cpu-max': 'int',
-'hotpluggable-cpus': 'bool'} }
+'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool'} }
 
 ##
 # @query-machines:
diff --git a/vl.c b/vl.c
index cd1fbc4..f5b083f 100644
--- a/vl.c
+++ b/vl.c
@@ -1428,6 +1428,7 @@ MachineInfoList *qmp_query_machines(Error **errp)
 info->name = g_strdup(mc->name);
 info->cpu_max = !mc->max_cpus ? 1 : mc->max_cpus;
 info->hotpluggable_cpus = mc->has_hotpluggable_cpus;
+info->numa_mem_supported = mc->numa_mem_supported;
 
 entry = g_malloc0(sizeof(*entry));
 entry->value = info;
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4 1/3] machine: show if CLI option '-numa node, mem' is supported in QAPI schema

2019-06-10 Thread Igor Mammedov

On Fri, 07 Jun 2019 19:39:17 +0200
Markus Armbruster  wrote:

> Igor Mammedov  writes:
> 
> > Legacy '-numa node,mem' option has a number of issues and mgmt often
> > defaults to it. Unfortunately it's no possible to replace it with
> > an alternative '-numa memdev' without breaking migration compatibility.
> > What's possible though is to deprecate it, keeping option working with
> > old machine types only.
> >
> > In order to help users to find out if being deprecated CLI option
> > '-numa node,mem' is still supported by particular machine type, add new
> > "numa-mem-supported" property to MachineInfo description in QAPI schema.  
> 
> Suggest s/to MachineInfo description in QAPI schema/to output of
> query-machines/, because query-machines is the external interface people
> know.

fixed

> 
> > "numa-mem-supported" is set to 'true' for machines that currently support
> > NUMA, but it will be flipped to 'false' later on, once deprecation period
> > expires and kept 'true' only for old machine types that used to support
> > the legacy option so it won't break existing configuration that are using
> > it.
> >
> > Signed-off-by: Igor Mammedov 
> > ---
> >
> > Notes:
> > v4:
> >   * drop idea to use "qom-list-properties" and use MachineInfo instead
> > which could be inspected with 'query-machines'
> >
> >  include/hw/boards.h | 3 +++
> >  hw/arm/virt.c   | 1 +
> >  hw/i386/pc.c| 1 +
> >  hw/ppc/spapr.c  | 1 +
> >  qapi/misc.json  | 5 -
> >  vl.c| 1 +
> >  6 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 6f7916f..86894b6 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -158,6 +158,8 @@ typedef struct {
> >   * @kvm_type:
> >   *Return the type of KVM corresponding to the kvm-type string option or
> >   *computed based on other criteria such as the host kernel 
> > capabilities.
> > + * @numa_mem_supported:
> > + *true if '--numa node.mem' option is supported and false otherwise
> >   */
> >  struct MachineClass {
> >  /*< private >*/
> > @@ -210,6 +212,7 @@ struct MachineClass {
> >  bool ignore_boot_device_suffixes;
> >  bool smbus_no_migration_support;
> >  bool nvdimm_supported;
> > +bool numa_mem_supported;
> >  
> >  HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
> > DeviceState *dev);
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index bf54f10..481a603 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, 
> > void *data)
> >  assert(!mc->get_hotplug_handler);
> >  mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
> >  hc->plug = virt_machine_device_plug_cb;
> > +mc->numa_mem_supported = true;
> >  }
> >  
> >  static void virt_instance_init(Object *obj)
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 2632b73..05b8368 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -2747,6 +2747,7 @@ static void pc_machine_class_init(ObjectClass *oc, 
> > void *data)
> >  nc->nmi_monitor_handler = x86_nmi;
> >  mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
> >  mc->nvdimm_supported = true;
> > +mc->numa_mem_supported = true;
> >  
> >  object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
> >  pc_machine_get_device_memory_region_size, NULL,
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 2ef3ce4..265ecfb 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -4336,6 +4336,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> > void *data)
> >   * in which LMBs are represented and hot-added
> >   */
> >  mc->numa_mem_align_shift = 28;
> > +mc->numa_mem_supported = true;
> >  
> >  smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
> >  smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON;  
> 
> This is correct when the TYPE_VIRT_MACHINE, TYPE_PC_MACHINE and
> TYPE_SPAPR_MACHINE are exactly the machines supporting NUMA.  How could
> I check that?

We don't have an interface to communicate that to

Re: [libvirt] [Qemu-devel] [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory distribution

2019-06-10 Thread Igor Mammedov

On Fri, 07 Jun 2019 19:28:58 +0200
Markus Armbruster  wrote:

> Igor Mammedov  writes:
> 
> > Changes since v3:
> >   - simplify series by dropping idea of showing property values in 
> > "qom-list-properties"
> > and use MachineInfo in QAPI schema instead  
> 
> Where did "[PATCH v3 1/6] pc: fix possible NULL pointer dereference in
> pc_machine_get_device_memory_region_size()" go?  It fixes a crash bug...

I'll post it as separate patch as it's not more related to this series

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v4 3/3] numa: deprecate implict memory distribution between nodes

2019-05-30 Thread Igor Mammedov

Implicit RAM distribution between nodes has exactly the same issues as:
  "numa: deprecate 'mem' parameter of '-numa node' option"
only with QEMU being the user that's 'adding' 'mem' parameter.

Deprecate it, to get it out of the way so that we could consolidate
guest RAM allocation using memory backends making it consistent and
possibly later on transition to using memory devices instead of
adhoc memory mapping for the initial RAM.

Signed-off-by: Igor Mammedov 
---
 numa.c   | 3 +++
 qemu-deprecated.texi | 8 
 2 files changed, 11 insertions(+)

diff --git a/numa.c b/numa.c
index 2205773..6d45a1f 100644
--- a/numa.c
+++ b/numa.c
@@ -409,6 +409,9 @@ void numa_complete_configuration(MachineState *ms)
 if (i == nb_numa_nodes) {
 assert(mc->numa_auto_assign_ram);
 mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
+warn_report("Default splitting of RAM between nodes is deprecated,"
+" Use '-numa node,memdev' to explictly define RAM"
+" allocation per node");
 }
 
 numa_total = 0;
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index eb347f5..c744ba9 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -98,6 +98,14 @@ In future new machine versions will not accept the option 
but it will still
 work with old machine types. User can check QAPI schema to see if the legacy
 option is supported by looking at MachineInfo::numa-mem-supported property.
 
+@subsection -numa node (without memory specified) (since 4.1)
+
+Splitting RAM by default between NUMA nodes has the same issues as @option{mem}
+parameter described above with the difference that the role of the user plays
+QEMU using implicit generic or board specific splitting rule.
+Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if
+it's supported by used machine type) to define mapping explictly instead.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v4 1/3] machine: show if CLI option '-numa node, mem' is supported in QAPI schema

2019-05-30 Thread Igor Mammedov

Legacy '-numa node,mem' option has a number of issues and mgmt often
defaults to it. Unfortunately it's no possible to replace it with
an alternative '-numa memdev' without breaking migration compatibility.
What's possible though is to deprecate it, keeping option working with
old machine types only.

In order to help users to find out if being deprecated CLI option
'-numa node,mem' is still supported by particular machine type, add new
"numa-mem-supported" property to MachineInfo description in QAPI schema.

"numa-mem-supported" is set to 'true' for machines that currently support
NUMA, but it will be flipped to 'false' later on, once deprecation period
expires and kept 'true' only for old machine types that used to support
the legacy option so it won't break existing configuration that are using
it.

Signed-off-by: Igor Mammedov 
---

Notes:
v4:
  * drop idea to use "qom-list-properties" and use MachineInfo instead
which could be inspected with 'query-machines'

 include/hw/boards.h | 3 +++
 hw/arm/virt.c   | 1 +
 hw/i386/pc.c| 1 +
 hw/ppc/spapr.c  | 1 +
 qapi/misc.json  | 5 -
 vl.c| 1 +
 6 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 6f7916f..86894b6 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -158,6 +158,8 @@ typedef struct {
  * @kvm_type:
  *Return the type of KVM corresponding to the kvm-type string option or
  *computed based on other criteria such as the host kernel capabilities.
+ * @numa_mem_supported:
+ *true if '--numa node.mem' option is supported and false otherwise
  */
 struct MachineClass {
 /*< private >*/
@@ -210,6 +212,7 @@ struct MachineClass {
 bool ignore_boot_device_suffixes;
 bool smbus_no_migration_support;
 bool nvdimm_supported;
+bool numa_mem_supported;
 
 HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
DeviceState *dev);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index bf54f10..481a603 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
 hc->plug = virt_machine_device_plug_cb;
+mc->numa_mem_supported = true;
 }
 
 static void virt_instance_init(Object *obj)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2632b73..05b8368 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2747,6 +2747,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 nc->nmi_monitor_handler = x86_nmi;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
+mc->numa_mem_supported = true;
 
 object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
 pc_machine_get_device_memory_region_size, NULL,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2ef3ce4..265ecfb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4336,6 +4336,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  * in which LMBs are represented and hot-added
  */
 mc->numa_mem_align_shift = 28;
+mc->numa_mem_supported = true;
 
 smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
 smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON;
diff --git a/qapi/misc.json b/qapi/misc.json
index 8b3ca4f..d0bdccb 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -2018,12 +2018,15 @@
 #
 # @hotpluggable-cpus: cpu hotplug via -device is supported (since 2.7.0)
 #
+# @numa-mem-supported: true if '-numa node,mem' option is supported by machine
+#  type and false otherwise (since 4.1)
+#
 # Since: 1.2.0
 ##
 { 'struct': 'MachineInfo',
   'data': { 'name': 'str', '*alias': 'str',
 '*is-default': 'bool', 'cpu-max': 'int',
-'hotpluggable-cpus': 'bool'} }
+'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool'} }
 
 ##
 # @query-machines:
diff --git a/vl.c b/vl.c
index 5550bd7..5bf17f5 100644
--- a/vl.c
+++ b/vl.c
@@ -1520,6 +1520,7 @@ MachineInfoList *qmp_query_machines(Error **errp)
 info->name = g_strdup(mc->name);
 info->cpu_max = !mc->max_cpus ? 1 : mc->max_cpus;
 info->hotpluggable_cpus = mc->has_hotpluggable_cpus;
+info->numa_mem_supported = mc->numa_mem_supported;
 
 entry = g_malloc0(sizeof(*entry));
 entry->value = info;
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v4 0/3] numa: deprecate '-numa node, mem' and default memory distribution

2019-05-30 Thread Igor Mammedov

Changes since v3:
  - simplify series by dropping idea of showing property values in 
"qom-list-properties"
and use MachineInfo in QAPI schema instead

Changes since v2:
  - taking in account previous review, implement a way for mgmt to intospect if
'-numa node,mem' is supported by machine type as suggested by Daniel at
 https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html
  * ammend "qom-list-properties" to show property values
  * add "numa-mem-supported" machine property to reflect if '-numa 
node,mem=SZ'
is supported. It culd be used with '-machine none' or at runtime with
--preconfig before numa memory mapping are configured
  * minor fixes to deprecation documentation mentioning "numa-mem-supported" 
property

 1) "I'm considering to deprecating -mem-path/prealloc CLI options and replacing
them with a single memdev Machine property to allow interested users to pick
used backend for initial RAM (fixes mixed -mem-path+hostmem backends issues)
and as a transition step to modeling initial RAM as a Device instead of
(ab)using MemoryRegion APIs."
(for more details see: 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg596314.html)

However there is a couple of roadblocks on the way (s390x and numa memory 
handling).
I think I finally thought out a way to hack s390x in migration compatible 
manner,
but I don't see any way to do it for -numa node,mem and default RAM assignement
to nodes. Considering both numa usecases aren't meaningfully using NUMA (aside
guest side testing) and could be replaced with explicitly used memdev parameter,
I'd like to propose removing these fake NUMA friends on new machine types,
hence this deprecation. And once the last machie type that supported the option
is removed we would be able to remove option altogether.

As result of removing deprecated options and replacing initial RAM allocation
with 'memdev's (1), QEMU will allocate guest RAM in consistent way, fixing mixed
use-case and allowing boards to move towards modelling initial RAM as Device(s).
Which in its own turn should allow to cleanup NUMA/HMP/memory accounting code
more by dropping ad-hoc node_mem tracking and reusing memory device enumeration
instead.

Reference to previous versions:
 * https://www.mail-archive.com/qemu-devel@nongnu.org/msg617694.html

CC: libvir-list@redhat.com
CC: ehabk...@redhat.com
CC: pbonz...@redhat.com
CC: berra...@redhat.com
CC: arm...@redhat.com

Igor Mammedov (3):
  machine: show if CLI option '-numa node,mem' is supported in QAPI
schema
  numa: deprecate 'mem' parameter of '-numa node' option
  numa: deprecate implict memory distribution between nodes

 include/hw/boards.h  |  3 +++
 hw/arm/virt.c|  1 +
 hw/i386/pc.c |  1 +
 hw/ppc/spapr.c   |  1 +
 numa.c   |  5 +
 qapi/misc.json   |  5 -
 qemu-deprecated.texi | 24 
 vl.c |  1 +
 8 files changed, 40 insertions(+), 1 deletion(-)

-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v4 2/3] numa: deprecate 'mem' parameter of '-numa node' option

2019-05-30 Thread Igor Mammedov

The parameter allows to configure fake NUMA topology where guest
VM simulates NUMA topology but not actually getting performance
benefits from it. The same or better results could be achieved
using 'memdev' parameter.
Beside of unpredictable performance, '-numa node.mem' option has
other issues when it's used with combination of -mem-path +
+ -mem-prealloc + memdev backends (pc-dimm), breaking binding of
memdev backends since mem-path/mem-prealloc are global and affect
the most of RAM allocations.

It's possible to make memdevs and global -mem-path/mem-prealloc
to play nicely together but that will just complicate already
complicated code and add unobious ways it could break on 2
different memmory allocation pathes and their combinations.

Instead of it, consolidate all guest RAM allocation over memdev
which still allows to create fake NUMA configurations if desired
and leaves one simplifyed code path to consider when it comes
to guest RAM allocation.

To achieve desired simplification deprecate 'mem' parameter as its
ad-hoc partitioning of initial RAM MemoryRegion can't be translated
to memdev based backend transparently to users and in compatible
manner (migration wise).

Later down the road that will allow to consolidate means of how
guest RAM is allocated and would permit us to clean up quite
a bit memory allocations and numa code, leaving only 'memdev'
implementation in place.

Signed-off-by: Igor Mammedov 
---

Notes:
v4:
  * fix up documentation to mention where users should look
to check if -numa node.mem is supported

 numa.c   |  2 ++
 qemu-deprecated.texi | 16 
 2 files changed, 18 insertions(+)

diff --git a/numa.c b/numa.c
index 3875e1e..2205773 100644
--- a/numa.c
+++ b/numa.c
@@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 
 if (node->has_mem) {
 numa_info[nodenr].node_mem = node->mem;
+warn_report("Parameter -numa node,mem is deprecated,"
+" use -numa node,memdev instead");
 }
 if (node->has_memdev) {
 Object *o;
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 50292d8..eb347f5 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -82,6 +82,22 @@ The @code{-realtime mlock=on|off} argument has been replaced 
by the
 The ``-virtfs_synth'' argument is now deprecated. Please use ``-fsdev synth''
 and ``-device virtio-9p-...'' instead.
 
+@subsection -numa node,mem=@var{size} (since 4.1)
+
+The parameter @option{mem} of @option{-numa node} is used to assign a part of
+guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
+RAM chunk on the host side (like bind it to a host node, setting bind policy, 
...),
+so guest end-ups with the fake NUMA configuration with suboptiomal performance.
+However since 2014 there is an alternative way to assign RAM to a NUMA node
+using parameter @option{memdev}, which does the same as @option{mem} and adds
+means to actualy manage node RAM on the host side. Use parameter 
@option{memdev}
+with @var{memory-backend-ram} backend as an replacement for parameter 
@option{mem}
+to achieve the same fake NUMA effect or a properly configured
+@var{memory-backend-file} backend to actually benefit from NUMA configuration.
+In future new machine versions will not accept the option but it will still
+work with old machine types. User can check QAPI schema to see if the legacy
+option is supported by looking at MachineInfo::numa-mem-supported property.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v3 1/6] pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size()

2019-05-28 Thread Igor Mammedov

On Mon, 27 May 2019 18:36:25 +0200
Markus Armbruster  wrote:

> Igor Mammedov  writes:
> 
> > QEMU will crash when device-memory-region-size property is read if 
> > ms->device_memory
> > wasn't initialized yet (ex: property being inspected during preconfig 
> > time).  
> 
> Reproduced:
> 
> $ qemu-system-x86_64 -nodefaults -S -display none -preconfig -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 4}, 
> "package": "v4.0.0-828-ga7b21f6762"}, "capabilities": ["oob"]}}
> {"execute": "qmp_capabilities"}
> {"return": {}}
> {"execute": "qom-get", "arguments": {"path": "/machine", "property": 
> "device-memory-region-size"}}
> Segmentation fault (core dumped)
> 
> First time I started looking at this series, I went "I'll need a
> reproducer to fully understand what's up, and I don't feel like finding
> one now; next series, please".  Second time, I had to spend a few
> minutes on the reproducer.  Wasn't hard, since you provided a clue.
> Still: make review easy, include a reproducer whenever you can.

sure

> 
> > Instead of crashing return 0 if ms->device_memory hasn't been initialized.
> >
> > Signed-off-by: Igor Mammedov 
> > ---
> >  hw/i386/pc.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index d98b737..de91e90 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -2461,7 +2461,11 @@ pc_machine_get_device_memory_region_size(Object 
> > *obj, Visitor *v,
> >   Error **errp)
> >  {
> >  MachineState *ms = MACHINE(obj);
> > -int64_t value = memory_region_size(&ms->device_memory->mr);
> > +int64_t value = 0;
> > +
> > +if (ms->device_memory) {
> > +memory_region_size(&ms->device_memory->mr);
> > +}
> >  
> >  visit_type_int(v, name, &value, errp);
> >  }  
> 
> This makes qom-get return 0 for the size of memory that doesn't exist,
> yet.
> 
> A possible alternative would be setting an error.
> 
> Opinions?
We don't have a notion of property not set in QOM, so a code that
will receive a text based error will have to parse it (horrible idea)
to avoid generation of related ACPI parts.

In case of not enabled memory hotplug, PC_MACHINE_DEVMEM_REGION_SIZE == 0
is valid value and it's what's expected by other code.



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v3 4/6] numa: introduce "numa-mem-supported" machine property

2019-05-28 Thread Igor Mammedov

On Mon, 27 May 2019 20:38:57 +0200
Markus Armbruster  wrote:

> Igor Mammedov  writes:
> 
> > '-numa mem' option has a number of issues and mgmt often defaults
> > to it. Unfortunately it's no possible to replace it with an alternative
> > '-numa memdev' without breaking migration compatibility.  
> 
> To be precise: -numa node,mem=... and -numa node,memdev=...  Correct?
yep, I'll try to use full syntax since so it would be clear to others.


> >  What's possible
> > though is to deprecate it, keeping option working with old machine types.
> > Once deprecation period expires, QEMU will disable '-numa mem' option,
> > usage on new machine types and when the last machine type that supported
> > it is removed we would be able to remove '-numa mem' with associated code.
> >
> > In order to help mgmt to find out if being deprecated CLI option
> > '-numa mem=SZ' is still supported by particular machine type, expose
> > this information via "numa-mem-supported" machine property.
> >
> > Users can use "qom-list-properties" QMP command to list machine type
> > properties including initial proprety values (when probing for supported
> > machine types with '-machine none') or at runtime at preconfig time
> > before numa mapping is configured and decide if they should used legacy
> > '-numa mem' or alternative '-numa memdev' option.  
> 
> This sentence is impenetrable, I'm afraid :)
> 
> If we only want to convey whether a machine type supports -numa
> node,mem=..., then adding a flag to query-machines suffices.  Since I'm
> pretty sure you'd have figured that out yourself, I suspect I'm missing
I didn't know about query-machines, hence implemented "qom-list-properties"
approach as was discussed at 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html

For the purpose of deprecating '-numa node,mem" query-machines is more than
enough. I'll drop 1-3 patches and respin series using query-machines.

> something.  Can you give me some examples of intended usage?
Perhaps there will in future use cases when introspecting 'defaults'
of objects will be needed, then we could look back into qom-list-properties
if there aren't a better alternative.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v3 3/6] qmp: qmp_qom_list_properties(): ignore empty string options

2019-05-17 Thread Igor Mammedov

Current QAPI semantics return empty "" string in case string property
value hasn't been set (i.e. NULL). Do not show initial value in this
case in "qom-list-properties" command output to reduce clutter.

Signed-off-by: Igor Mammedov 
---
 qmp.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/qmp.c b/qmp.c
index 8415541..463c7d4 100644
--- a/qmp.c
+++ b/qmp.c
@@ -41,6 +41,7 @@
 #include "qom/object_interfaces.h"
 #include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi_dev_interface.h"
+#include "qapi/qmp/qstring.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -596,7 +597,16 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char 
*typename,
 if (obj) {
 info->q_default =
 object_property_get_qobject(obj, info->name, NULL);
-info->has_q_default = !!info->q_default;
+if (info->q_default) {
+   if (qobject_type(info->q_default) == QTYPE_QSTRING) {
+   QString *value = qobject_to(QString, info->q_default);
+   if (!strcmp(qstring_get_str(value), "")) {
+   qobject_unref(info->q_default);
+   info->q_default = NULL;
+   }
+   }
+   info->has_q_default = !!info->q_default;
+}
 }
 
 entry = g_malloc0(sizeof(*entry));
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v3 4/6] numa: introduce "numa-mem-supported" machine property

2019-05-17 Thread Igor Mammedov

'-numa mem' option has a number of issues and mgmt often defaults
to it. Unfortunately it's no possible to replace it with an alternative
'-numa memdev' without breaking migration compatibility. What's possible
though is to deprecate it, keeping option working with old machine types.
Once deprecation period expires, QEMU will disable '-numa mem' option,
usage on new machine types and when the last machine type that supported
it is removed we would be able to remove '-numa mem' with associated code.

In order to help mgmt to find out if being deprecated CLI option
'-numa mem=SZ' is still supported by particular machine type, expose
this information via "numa-mem-supported" machine property.

Users can use "qom-list-properties" QMP command to list machine type
properties including initial proprety values (when probing for supported
machine types with '-machine none') or at runtime at preconfig time
before numa mapping is configured and decide if they should used legacy
'-numa mem' or alternative '-numa memdev' option.

Signed-off-by: Igor Mammedov 
---
 include/hw/boards.h |  1 +
 hw/arm/virt.c   |  1 +
 hw/core/machine.c   | 12 
 hw/i386/pc.c|  1 +
 hw/ppc/spapr.c  |  1 +
 5 files changed, 16 insertions(+)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 6f7916f..9e347cf 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -210,6 +210,7 @@ struct MachineClass {
 bool ignore_boot_device_suffixes;
 bool smbus_no_migration_support;
 bool nvdimm_supported;
+bool numa_mem_supported;
 
 HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
DeviceState *dev);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5331ab7..2e86c78 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1943,6 +1943,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
 hc->plug = virt_machine_device_plug_cb;
+mc->numa_mem_supported = true;
 }
 
 static void virt_instance_init(Object *obj)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 5d046a4..8bc53ba 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -506,6 +506,13 @@ static char *machine_get_nvdimm_persistence(Object *obj, 
Error **errp)
 return g_strdup(ms->nvdimms_state->persistence_string);
 }
 
+static bool machine_get_numa_mem_supported(Object *obj, Error **errp)
+{
+MachineClass *mc = MACHINE_GET_CLASS(obj);
+
+return mc->numa_mem_supported;
+}
+
 static void machine_set_nvdimm_persistence(Object *obj, const char *value,
Error **errp)
 {
@@ -810,6 +817,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
 &error_abort);
 object_class_property_set_description(oc, "memory-encryption",
 "Set memory encryption object to use", &error_abort);
+
+object_class_property_add_bool(oc, "numa-mem-supported",
+machine_get_numa_mem_supported, NULL, &error_abort);
+object_class_property_set_description(oc, "numa-mem-supported",
+"Shows if legacy '-numa mem=SIZE option is supported", &error_abort);
 }
 
 static void machine_class_base_init(ObjectClass *oc, void *data)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index de91e90..bec0055 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2756,6 +2756,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 nc->nmi_monitor_handler = x86_nmi;
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
+mc->numa_mem_supported = true;
 
 object_class_property_add(oc, PC_MACHINE_DEVMEM_REGION_SIZE, "int",
 pc_machine_get_device_memory_region_size, NULL,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2ef3ce4..265ecfb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4336,6 +4336,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
  * in which LMBs are represented and hot-added
  */
 mc->numa_mem_align_shift = 28;
+mc->numa_mem_supported = true;
 
 smc->default_caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
 smc->default_caps.caps[SPAPR_CAP_VSX] = SPAPR_CAP_ON;
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v3 6/6] numa: deprecate implict memory distribution between nodes

2019-05-17 Thread Igor Mammedov

Implicit RAM distribution between nodes has exactly the same issues as:
  "numa: deprecate 'mem' parameter of '-numa node' option"
only with QEMU being the user that's 'adding' 'mem' parameter.

Deprecate it, to get it out of the way so that we could consolidate
guest RAM allocation using memory backends making it consistent and
possibly later on transition to using memory devices instead of
adhoc memory mapping of initial RAM.
---
 v3:
   - update deprecation doc, s/4.0/4.1/
   - mention that legacy 'mem' option could also be used to
 provide explicit memory distribution for old machine types

Signed-off-by: Igor Mammedov 
---
 numa.c   | 3 +++
 qemu-deprecated.texi | 8 
 2 files changed, 11 insertions(+)

diff --git a/numa.c b/numa.c
index 2205773..6d45a1f 100644
--- a/numa.c
+++ b/numa.c
@@ -409,6 +409,9 @@ void numa_complete_configuration(MachineState *ms)
 if (i == nb_numa_nodes) {
 assert(mc->numa_auto_assign_ram);
 mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
+warn_report("Default splitting of RAM between nodes is deprecated,"
+" Use '-numa node,memdev' to explictly define RAM"
+" allocation per node");
 }
 
 numa_total = 0;
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 995a96c..546f722 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -88,6 +88,14 @@ In future new machine versions will not accept the option 
but it will keep
 working with old machine types. User can inspect read-only machine property
 'numa-mem-supported' to check if specific machine type (not) supports the 
option.
 
+@subsection -numa node (without memory specified) (since 4.1)
+
+Splitting RAM by default between NUMA nodes has the same issues as @option{mem}
+parameter described above with the difference that the role of the user plays
+QEMU using implicit generic or board specific splitting rule.
+Use @option{memdev} with @var{memory-backend-ram} backend or @option{mem} (if
+it's supported by used machine type) to define mapping explictly instead.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v3 5/6] numa: deprecate 'mem' parameter of '-numa node' option

2019-05-17 Thread Igor Mammedov

The parameter allows to configure fake NUMA topology where guest
VM simulates NUMA topology but not actually getting a performance
benefits from it. The same or better results could be achieved
using 'memdev' parameter. In light of that any VM that uses NUMA
to get its benefits should use 'memdev'. To allow transition
initial RAM to device based model, deprecate 'mem' parameter as
its ad-hoc partitioning of initial RAM MemoryRegion can't be
translated to memdev based backend transparently to users and in
compatible manner (migration wise).

That will also allow to clean up a bit our numa code, leaving only
'memdev' impl. in place and several boards that use node_mem
to generate FDT/ACPI description from it.

Signed-off-by: Igor Mammedov 
---
v3:
 * mention "numa-mem-supported" machine property in deprecation
   documentation.
---
 numa.c   |  2 ++
 qemu-deprecated.texi | 16 
 2 files changed, 18 insertions(+)

diff --git a/numa.c b/numa.c
index 3875e1e..2205773 100644
--- a/numa.c
+++ b/numa.c
@@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 
 if (node->has_mem) {
 numa_info[nodenr].node_mem = node->mem;
+warn_report("Parameter -numa node,mem is deprecated,"
+" use -numa node,memdev instead");
 }
 if (node->has_memdev) {
 Object *o;
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 842e71b..995a96c 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -72,6 +72,22 @@ backend settings instead of environment variables.  To ease 
migration to
 the new format, the ``-audiodev-help'' option can be used to convert
 the current values of the environment variables to ``-audiodev'' options.
 
+@subsection -numa node,mem=@var{size} (since 4.1)
+
+The parameter @option{mem} of @option{-numa node} is used to assign a part of
+guest RAM to a NUMA node. But when using it, it's impossible to manage 
specified
+size on the host side (like bind it to a host node, setting bind policy, ...),
+so guest end-ups with the fake NUMA configuration with suboptiomal performance.
+However since 2014 there is an alternative way to assign RAM to a NUMA node
+using parameter @option{memdev}, which does the same as @option{mem} and 
provides
+means to actualy manage node RAM on the host side. Use parameter 
@option{memdev}
+with @var{memory-backend-ram} backend as an replacement for parameter 
@option{mem}
+to achieve the same fake NUMA effect or a properly configured
+@var{memory-backend-file} backend to actually benefit from NUMA configuration.
+In future new machine versions will not accept the option but it will keep
+working with old machine types. User can inspect read-only machine property
+'numa-mem-supported' to check if specific machine type (not) supports the 
option.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v3 1/6] pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size()

2019-05-17 Thread Igor Mammedov

QEMU will crash when device-memory-region-size property is read if 
ms->device_memory
wasn't initialized yet (ex: property being inspected during preconfig time).

Instead of crashing return 0 if ms->device_memory hasn't been initialized.

Signed-off-by: Igor Mammedov 
---
 hw/i386/pc.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d98b737..de91e90 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2461,7 +2461,11 @@ pc_machine_get_device_memory_region_size(Object *obj, 
Visitor *v,
  Error **errp)
 {
 MachineState *ms = MACHINE(obj);
-int64_t value = memory_region_size(&ms->device_memory->mr);
+int64_t value = 0;
+
+if (ms->device_memory) {
+memory_region_size(&ms->device_memory->mr);
+}
 
 visit_type_int(v, name, &value, errp);
 }
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v3 0/6] numa: deprecate '-numa node, mem' and default memory distribution

2019-05-17 Thread Igor Mammedov

Changes since v2:
  - taking in account previous review, implement a way for mgmt to intospect if
'-numa node,mem' is supported by machine type as suggested by Daniel at
 https://www.mail-archive.com/qemu-devel@nongnu.org/msg601220.html
  * ammend "qom-list-properties" to show property values
  * add "numa-mem-supported" machine property to reflect if '-numa 
node,mem=SZ'
is supported. It culd be used with '-machine none' or at runtime with
--preconfig before numa memory mapping are configured
  * minor fixes to deprecation documentation mentioning "numa-mem-supported" 
property

 1) "I'm considering to deprecating -mem-path/prealloc CLI options and replacing
them with a single memdev Machine property to allow interested users to pick
used backend for initial RAM (fixes mixed -mem-path+hostmem backends issues)
and as a transition step to modeling initial RAM as a Device instead of
(ab)using MemoryRegion APIs."
(for more details see: 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg596314.html)

However there is a couple of roadblocks on the way (s390x and numa memory 
handling).
I think I finally thought out a way to hack s390x in migration compatible 
manner,
but I don't see any way to do it for -numa node,mem and default RAM assignement
to nodes. Considering both numa usecases aren't meaningfully using NUMA (aside
guest side testing) and could be replaced with explicitly used memdev parameter,
I'd like to propose removing these fake NUMA friends on new machine types,
hence this deprecation. And once the last machie type that supported the option
is removed we would be able to remove option altogether.

As result of removing deprecated options and replacing initial RAM allocation
with 'memdev's (1), QEMU will allocate guest RAM in consistent way, fixing mixed
use-case and allowing boards to move towards modelling initial RAM as Device(s).
Which in its own turn should allow to cleanup NUMA/HMP/memory accounting code
more by dropping ad-hoc node_mem tracking and reusing memory device enumeration
instead.

Reference to previous versions:
 * [PATCH 0/2] numa: deprecate -numa node, mem and default memory distribution
https://www.mail-archive.com/qemu-devel@nongnu.org/msg600706.html
 * [PATCH] numa: warn if numa 'mem' option or default RAM splitting between 
nodes is used.
https://www.mail-archive.com/qemu-devel@nongnu.org/msg602136.html
 * [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between 
nodes is used.
https://www.spinics.net/linux/fedora/libvir/msg180917.html

CC: libvir-list@redhat.com
CC: ehabk...@redhat.com
CC: pbonz...@redhat.com
CC: berra...@redhat.com
CC: arm...@redhat.com

Igor Mammedov (6):
  pc: fix possible NULL pointer dereference in
pc_machine_get_device_memory_region_size()
  qmp: make "qom-list-properties" show initial property values
  qmp: qmp_qom_list_properties(): ignore empty string options
  numa: introduce "numa-mem-supported" machine property
  numa: deprecate 'mem' parameter of '-numa node' option
  numa: deprecate implict memory distribution between nodes

 include/hw/boards.h  |  1 +
 hw/arm/virt.c|  1 +
 hw/core/machine.c| 12 
 hw/i386/pc.c |  7 ++-
 hw/ppc/spapr.c   |  1 +
 numa.c   |  5 +
 qapi/misc.json   |  5 -
 qemu-deprecated.texi | 24 
 qmp.c| 15 +++
 9 files changed, 69 insertions(+), 2 deletions(-)

-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v3 2/6] qmp: make "qom-list-properties" show initial property values

2019-05-17 Thread Igor Mammedov

Add in the command output object's property values right after creation
(i.e. state of the object returned by object_new() or equivalent).

Follow up patch will add machine property 'numa-mem-supported', which
would allow mgmt to introspect which machine types (versions) still
support legacy "-numa mem=FOO" CLI option and which don't and require
alternative '-numa memdev' option being used.

Signed-off-by: Igor Mammedov 
---
 qapi/misc.json | 5 -
 qmp.c  | 5 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index 8b3ca4f..e333285 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1365,10 +1365,13 @@
 #
 # @description: if specified, the description of the property.
 #
+# @default: initial property value.
+#
 # Since: 1.2
 ##
 { 'struct': 'ObjectPropertyInfo',
-  'data': { 'name': 'str', 'type': 'str', '*description': 'str' } }
+  'data': { 'name': 'str', 'type': 'str', '*description': 'str',
+'*default': 'any' } }
 
 ##
 # @qom-list:
diff --git a/qmp.c b/qmp.c
index b92d62c..8415541 100644
--- a/qmp.c
+++ b/qmp.c
@@ -593,6 +593,11 @@ ObjectPropertyInfoList *qmp_qom_list_properties(const char 
*typename,
 info->type = g_strdup(prop->type);
 info->has_description = !!prop->description;
 info->description = g_strdup(prop->description);
+if (obj) {
+info->q_default =
+object_property_get_qobject(obj, info->name, NULL);
+info->has_q_default = !!info->q_default;
+}
 
 entry = g_malloc0(sizeof(*entry));
 entry->value = info;
-- 
2.7.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.

2019-03-20 Thread Igor Mammedov

On Wed, 20 Mar 2019 15:24:42 +
Daniel P. Berrangé  wrote:

> On Wed, Mar 20, 2019 at 04:20:19PM +0100, Igor Mammedov wrote:
> > > This could be solved if QEMU has some machine type based property
> > > that indicates whether "memdev" is required for a given machine,
> > > but crucially *does not* actually activate that property until
> > > several releases later.
> > > 
> > > We're too late for 4.0, so lets consider QEMU 4.1 as the
> > > next release of QEMU, which opens for dev in April 2019.
> > > 
> > > QEMU 4.1 could introduce a machine type property "requires-memdev"
> > > which defaults to "false" for all existing machine types. It
> > > could add a deprecation that says a *future* machine type will
> > > report "requires-memdev=true".  IOW,  "pc-i440fx-4.1" and
> > > "pc-i440fx-4.2 must still report "requires-memdev=false", 
> > > 
> > > Libvirt 5.4.0 (May 2019) can now add support for "requires-memdev"
> > > property. This would be effectively a no-op at time of this libvirt
> > > release, since no QEMU would be reporting "requires-memdev=true" 
> > > for many months to come yet.
> > > 
> > > Now, after 2 QEMU releases with the deprecation wawrning, when
> > > the QEMU 5.0.0 dev cycle opens in Jan 2020, the new "pc-i440fx-5.0"
> > >  machine type can be made to report "requires-memdev=true".
> > > 
> > > IOW, in April 2020 when QEMU 5.0.0 comes out, "mem" would
> > > no longer be supported for new machine types. Libvirt at this  
> >   ^^^
> >   
> > > time would be upto 6.4.0 but that's co-incidental since it
> > > would already be doing the right thing since 5.4.0.
> > > 
> > > IOW, this QEMU 5.0.0 would work correctly with libvirt versions
> > > in the range 5.4.0 to 6.4.0 (and future).  
> >   
> > > If a user had libvirt < 5.4.0 (ie older than May 2019) nothing
> > > would stop them using the "pc-i440fx-5.0" machine type, but
> > > libvirt would be liable to use "mem" instead of "memdev" and  
> >   
> > > if that happened they would be unable to live migrate to a
> > > host newer libvirt which honours "requires-memdev=true"  
> > I failed to parse this section in connection '^' underlined part,
> > I'm reading 'no longer be supported' as it's not possible to start
> > QEMU -M machine_foo.requires-memdev=true with 'mem' option.
> > Is it what you've meant?  
> 
> I wasn't actually meaning QEMU to forbid it when i wrote this, 
> but on reflection, it would make sense to forbid it, as that
> would avoid the user getting into a messy situation with 
> versions of libvirt that predate knowledge of the requires-memdev
> property.
Forbidding is my goal as it (at least for new machine types)
 - removes possibility of mis-configuration 
 - allows in new machine to switch to frontend-backend memory model
   in clean way consolidating/unifying memory management
   (i.e. not need to map 'mem' to memdev, which from recent
migration experiment appears to be impossible to do reliably)
 - remove someday 'mem' and all related code from QEMU once
   the last old machine where it was possible to use if deprecated
   (well, it's rather far fetched goal for that we need to come up
with schedule/policyhow/when we would deprecate old machines).


> > > So in summary the key to being able to tie deprecations to machine 
> > > type versions, is for QEMU to add a mechanism to report the desired 
> > > new feature usage approach against the machine type, but then ensure
> > > the mechanism continues to report the old approach for 2 more releases.  
> > 
> > so that makes QEMU deprecation period effectively 3 releases (assuming 
> > 4 months cadence).  
> 
> There's a distinction betweenm releases and development cycles here.
> The deprecation policy is defined as 2 releases, which means between
> 2 and 3 development cycles depending on when in the dev cycle the
> deprecation is added (start vs the end of the dev cycle)
> 
> Regards,
> Daniel


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.

2019-03-20 Thread Igor Mammedov

On Wed, 20 Mar 2019 13:46:59 +
Daniel P. Berrangé  wrote:

> On Wed, Mar 20, 2019 at 10:32:53AM -0300, Eduardo Habkost wrote:
> > On Wed, Mar 20, 2019 at 11:51:51AM +, Daniel P. Berrangé wrote:  
> > > On Wed, Mar 20, 2019 at 11:26:34AM +0100, Igor Mammedov wrote:  
> > [...]  
[...]
> > If a feature is deprecated, I would expect the management stack
> > to stop using the deprecated feature by default as soon as
> > possible, not 1 year after it was deprecated.  
> 
> True, but the challenge here is that we need to stop using the
> feature in a way that isn't going to break ability to live migrate
> VMs spawned by previous versions of libvirt. 
VM should be able to start in the first place, if we disable 'mem'
on new machine, old libvirt using 'mem' won't be able to start VM
with it, it won't ever come to migration point.
(it's a clear signal to user about mis-configured host, at least
this old/new issue shouldn't happen in downstream as it ships
compatible set of packages).

[...]
> 
> Regards,
> Daniel


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.

2019-03-20 Thread Igor Mammedov

On Wed, 20 Mar 2019 11:51:51 +
Daniel P. Berrangé  wrote:

> On Wed, Mar 20, 2019 at 11:26:34AM +0100, Igor Mammedov wrote:
> > On Tue, 19 Mar 2019 14:51:07 +
> > Daniel P. Berrangé  wrote:
> >   
> > > On Tue, Mar 19, 2019 at 02:08:01PM +0100, Igor Mammedov wrote:  
> > > > On Thu, 7 Mar 2019 10:07:05 +
> > > > Daniel P. Berrangé  wrote:
> > > > 
> > > > > On Wed, Mar 06, 2019 at 07:54:17PM +0100, Igor Mammedov wrote:
> > > > > > On Wed, 6 Mar 2019 18:16:08 +
> > > > > > Daniel P. Berrangé  wrote:
> > > > > >   
> > > > > > > On Wed, Mar 06, 2019 at 06:33:25PM +0100, Igor Mammedov wrote:
> > > > > > >   
> > > > > > > > Amend -numa option docs and print warnings if 'mem' option or 
> > > > > > > > default RAM
> > > > > > > > splitting between nodes is used. It's intended to discourage 
> > > > > > > > users from using
> > > > > > > > configuration that allows only to fake NUMA on guest side while 
> > > > > > > > leading
> > > > > > > > to reduced performance of the guest due to inability to 
> > > > > > > > properly configure
> > > > > > > > VM's RAM on the host.
> > > > > > > > 
> > > > > > > > In NUMA case, it's recommended to always explicitly configure 
> > > > > > > > guest RAM
> > > > > > > > using -numa node,memdev={backend-id} option.
> > > > > > > > 
> > > > > > > > Signed-off-by: Igor Mammedov 
> > > > > > > > ---
> > > > > > > >  numa.c  |  5 +
> > > > > > > >  qemu-options.hx | 12 
> > > > > > > >  2 files changed, 13 insertions(+), 4 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/numa.c b/numa.c
> > > > > > > > index 3875e1e..42838f9 100644
> > > > > > > > --- a/numa.c
> > > > > > > > +++ b/numa.c
> > > > > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState 
> > > > > > > > *ms, NumaNodeOptions *node,
> > > > > > > >  
> > > > > > > >  if (node->has_mem) {
> > > > > > > >  numa_info[nodenr].node_mem = node->mem;
> > > > > > > > +warn_report("Parameter -numa node,mem is obsolete,"
> > > > > > > > +" use -numa node,memdev instead");  
> > > > > > > 
> > > > > > > My comments from v1 still apply. We must not do this as long as
> > > > > > > libvirt has no choice but to continue using this feature.  
> > > > > > It has a choice to use 'memdev' whenever creating a new VM and 
> > > > > > continue
> > > > > > using 'mem' with exiting VMs.  
> > > > > 
> > > > > Unfortunately we don't have such a choice. Libvirt has no concept of 
> > > > > the
> > > > > distinction between an 'existing' and 'new' VM. It just receives an 
> > > > > XML
> > > > > file from the mgmt application and with transient guests, we have no
> > > > > persistent configuration record of the VM. So we've no way of knowing
> > > > > whether this VM was previously running on this same host, or another
> > > > > host, or is completely new.
> > > > In case of transient VM, libvirt might be able to use machine version
> > > > as deciding which option to use (memdev is around more than 4 years 
> > > > since 2.1)
> > > > (or QEMU could provide introspection into what machine version 
> > > > (not)supports,
> > > > like it was discussed before)
> > > > 
> > > > As discussed elsewhere (v1 tread|IRC), there are users (mainly CI) for 
> > > > which
> > > > fake NUMA is sufficient and they do not ask for explicit pinning, so 
> > > > libvirt
> > > > defaults to legacy -numa node,mem option.
> > > > Those users do not care no aware that they should use memdev instead
> > > > (I'm n

Re: [libvirt] [Qemu-devel] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.

2019-03-20 Thread Igor Mammedov

On Tue, 19 Mar 2019 14:51:07 +
Daniel P. Berrangé  wrote:

> On Tue, Mar 19, 2019 at 02:08:01PM +0100, Igor Mammedov wrote:
> > On Thu, 7 Mar 2019 10:07:05 +
> > Daniel P. Berrangé  wrote:
> >   
> > > On Wed, Mar 06, 2019 at 07:54:17PM +0100, Igor Mammedov wrote:  
> > > > On Wed, 6 Mar 2019 18:16:08 +
> > > > Daniel P. Berrangé  wrote:
> > > > 
> > > > > On Wed, Mar 06, 2019 at 06:33:25PM +0100, Igor Mammedov wrote:
> > > > > > Amend -numa option docs and print warnings if 'mem' option or 
> > > > > > default RAM
> > > > > > splitting between nodes is used. It's intended to discourage users 
> > > > > > from using
> > > > > > configuration that allows only to fake NUMA on guest side while 
> > > > > > leading
> > > > > > to reduced performance of the guest due to inability to properly 
> > > > > > configure
> > > > > > VM's RAM on the host.
> > > > > > 
> > > > > > In NUMA case, it's recommended to always explicitly configure guest 
> > > > > > RAM
> > > > > > using -numa node,memdev={backend-id} option.
> > > > > > 
> > > > > > Signed-off-by: Igor Mammedov 
> > > > > > ---
> > > > > >  numa.c  |  5 +
> > > > > >  qemu-options.hx | 12 
> > > > > >  2 files changed, 13 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/numa.c b/numa.c
> > > > > > index 3875e1e..42838f9 100644
> > > > > > --- a/numa.c
> > > > > > +++ b/numa.c
> > > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, 
> > > > > > NumaNodeOptions *node,
> > > > > >  
> > > > > >  if (node->has_mem) {
> > > > > >  numa_info[nodenr].node_mem = node->mem;
> > > > > > +warn_report("Parameter -numa node,mem is obsolete,"
> > > > > > +" use -numa node,memdev instead");
> > > > > 
> > > > > My comments from v1 still apply. We must not do this as long as
> > > > > libvirt has no choice but to continue using this feature.
> > > > It has a choice to use 'memdev' whenever creating a new VM and continue
> > > > using 'mem' with exiting VMs.
> > > 
> > > Unfortunately we don't have such a choice. Libvirt has no concept of the
> > > distinction between an 'existing' and 'new' VM. It just receives an XML
> > > file from the mgmt application and with transient guests, we have no
> > > persistent configuration record of the VM. So we've no way of knowing
> > > whether this VM was previously running on this same host, or another
> > > host, or is completely new.  
> > In case of transient VM, libvirt might be able to use machine version
> > as deciding which option to use (memdev is around more than 4 years since 
> > 2.1)
> > (or QEMU could provide introspection into what machine version 
> > (not)supports,
> > like it was discussed before)
> > 
> > As discussed elsewhere (v1 tread|IRC), there are users (mainly CI) for which
> > fake NUMA is sufficient and they do not ask for explicit pinning, so libvirt
> > defaults to legacy -numa node,mem option.
> > Those users do not care no aware that they should use memdev instead
> > (I'm not sure if they are able to ask libvirt for non pinned numa memory
> > which results in memdev being used).
> > This patch doesn't obsolete anything yet, it serves purpose to inform users
> > that they are using legacy option and advises replacement option
> > so that users would know to what they should adapt to.
> > 
> > Once we deprecate and then remove 'mem' for new machines only (while keeping
> > 'mem' working on old machine versions). The new nor old libvirt won't be 
> > able
> > to start new machine type with 'mem' option and have to use memdev variant,
> > so we don't have migration issues with new machines and old ones continue
> > working with 'mem'.  
> 
> I'm not seeing what has changed which would enable us to deprecate
> something only for new machines. That's not possible from libvirt's
> POV as old

Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option

2019-03-19 Thread Igor Mammedov

On Sun, 10 Mar 2019 11:14:08 +0100
Markus Armbruster  wrote:

> Daniel P. Berrangé  writes:
> 
> > On Mon, Mar 04, 2019 at 12:45:14PM +0100, Markus Armbruster wrote:  
> >> Daniel P. Berrangé  writes:
> >>   
> >> > On Mon, Mar 04, 2019 at 08:13:53AM +0100, Markus Armbruster wrote:  
> >> >> If we deprecate outdated NUMA configurations now, we can start rejecting
> >> >> them with new machine types after a suitable grace period.  
> >> >
> >> > How is libvirt going to know what machines it can use with the feature ?
> >> > We don't have any way to introspect machine type specific logic, since we
> >> > run all probing with "-machine none", and QEMU can't report anything 
> >> > about
> >> > machines without instantiating them.  
> >> 
> >> Fair point.  A practical way for management applications to decide which
> >> of the two interfaces they can use with which machine type may be
> >> required for deprecating one of the interfaces with new machine types.  
> >
> > We currently have  "qom-list-properties" which can report on the
> > existance of properties registered against object types. What it
> > can't do though is report on the default values of these properties.  
> 
> Yes.
> 
> > What's interesting though is that qmp_qom_list_properties will actually
> > instantiate objects in order to query properties, if the type isn't an
> > abstract type.  
> 
> If it's an abstract type, qom-list-properties returns the properties
> created with object_class_property_add() & friends, typically by the
> class_init method.  This is possible without instantiating the type.
> 
> If it's a concrete type, qom-list-properties additionally returns the
> properties created with object_property_add(), typically by the
> instance_init() method.  This requires instantiating the type.
> 
> Both kinds of properties can be added or deleted at any time.  For
> instance, setting a property value with object_property_set() or similar
> could create additional properties.
> 
> For historical reasons, we use often use object_property_add() where
> object_class_property_add() would do.  Sad.
> 
> > IOW, even if you are running "$QEMU -machine none", then if at the qmp-shell
> > you do
> >
> >(QEMU) qom-list-properties typename=pc-q35-2.6-machine
> >
> > it will have actually instantiate the pc-q35-2.6-machine machine type.
> > Since it has instantiated the machine, the object initializer function
> > will have run and initialized the default values for various properties.
> >
> > IOW, it is possible for qom-list-properties to report on default values
> > for non-abstract types.  
> 
> instance_init() also initializes the properties' values.
> qom-list-properties could show these initial values (I hesitate calling
> them default values).
> 
> Setting a property's value can change other properties' values by side
> effect.
> 
> My point is: the properties qom-list-properties shows and the initial
> values it could show are not necessarily final.  QOM is designed to be
> maximally flexible, and flexibility brings along its bosom-buddy
> complexity.
> 
> If you keep that in mind, qom-list-properties can be put to good use all
> the same.
> 
> A way to report "default values" (really: whatever the values are after
> object_new()) feels like a fair feature request to me, if backed by an
> actual use case.

Looks like trying to migrate from 'mem' to 'memdev', just creates another
train-wreck (where libvirt would have to hunt for 'right' backend
configuration to make migration work and even that would be best effort
attempt). If that would work reliably, I'd go for it since it allows to
drop 'mem' codepath altogether but it doesn't look possible.
So I'll look into adding machine level introspection and deprecating 'mem'
option for new machine types.

> [...]
> 


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.

2019-03-19 Thread Igor Mammedov

On Thu, 7 Mar 2019 10:07:05 +
Daniel P. Berrangé  wrote:

> On Wed, Mar 06, 2019 at 07:54:17PM +0100, Igor Mammedov wrote:
> > On Wed, 6 Mar 2019 18:16:08 +
> > Daniel P. Berrangé  wrote:
> >   
> > > On Wed, Mar 06, 2019 at 06:33:25PM +0100, Igor Mammedov wrote:  
> > > > Amend -numa option docs and print warnings if 'mem' option or default 
> > > > RAM
> > > > splitting between nodes is used. It's intended to discourage users from 
> > > > using
> > > > configuration that allows only to fake NUMA on guest side while leading
> > > > to reduced performance of the guest due to inability to properly 
> > > > configure
> > > > VM's RAM on the host.
> > > > 
> > > > In NUMA case, it's recommended to always explicitly configure guest RAM
> > > > using -numa node,memdev={backend-id} option.
> > > > 
> > > > Signed-off-by: Igor Mammedov 
> > > > ---
> > > >  numa.c  |  5 +
> > > >  qemu-options.hx | 12 
> > > >  2 files changed, 13 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/numa.c b/numa.c
> > > > index 3875e1e..42838f9 100644
> > > > --- a/numa.c
> > > > +++ b/numa.c
> > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, 
> > > > NumaNodeOptions *node,
> > > >  
> > > >  if (node->has_mem) {
> > > >  numa_info[nodenr].node_mem = node->mem;
> > > > +warn_report("Parameter -numa node,mem is obsolete,"
> > > > +" use -numa node,memdev instead");  
> > > 
> > > My comments from v1 still apply. We must not do this as long as
> > > libvirt has no choice but to continue using this feature.  
> > It has a choice to use 'memdev' whenever creating a new VM and continue
> > using 'mem' with exiting VMs.  
> 
> Unfortunately we don't have such a choice. Libvirt has no concept of the
> distinction between an 'existing' and 'new' VM. It just receives an XML
> file from the mgmt application and with transient guests, we have no
> persistent configuration record of the VM. So we've no way of knowing
> whether this VM was previously running on this same host, or another
> host, or is completely new.
In case of transient VM, libvirt might be able to use machine version
as deciding which option to use (memdev is around more than 4 years since 2.1)
(or QEMU could provide introspection into what machine version (not)supports,
like it was discussed before)

As discussed elsewhere (v1 tread|IRC), there are users (mainly CI) for which
fake NUMA is sufficient and they do not ask for explicit pinning, so libvirt
defaults to legacy -numa node,mem option.
Those users do not care no aware that they should use memdev instead
(I'm not sure if they are able to ask libvirt for non pinned numa memory
which results in memdev being used).
This patch doesn't obsolete anything yet, it serves purpose to inform users
that they are using legacy option and advises replacement option
so that users would know to what they should adapt to.

Once we deprecate and then remove 'mem' for new machines only (while keeping
'mem' working on old machine versions). The new nor old libvirt won't be able
to start new machine type with 'mem' option and have to use memdev variant,
so we don't have migration issues with new machines and old ones continue
working with 'mem'.

That keeps QEMU's promise not to break existing configurations while let us
move forward with new machines.

> Regards,
> Daniel


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option

2019-03-18 Thread Igor Mammedov

On Mon, 4 Mar 2019 14:52:30 +0100
Igor Mammedov  wrote:

> On Fri, 1 Mar 2019 18:01:52 +
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Igor Mammedov (imamm...@redhat.com) wrote:  
> > > On Fri, 1 Mar 2019 15:49:47 +
> > > Daniel P. Berrangé  wrote:
> > > 
> > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> > > > > The parameter allows to configure fake NUMA topology where guest
> > > > > VM simulates NUMA topology but not actually getting a performance
> > > > > benefits from it. The same or better results could be achieved
> > > > > using 'memdev' parameter. In light of that any VM that uses NUMA
> > > > > to get its benefits should use 'memdev' and to allow transition
> > > > > initial RAM to device based model, deprecate 'mem' parameter as
> > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't be
> > > > > translated to memdev based backend transparently to users and in
> > > > > compatible manner (migration wise).
> > > > > 
> > > > > That will also allow to clean up a bit our numa code, leaving only
> > > > > 'memdev' impl. in place and several boards that use node_mem
> > > > > to generate FDT/ACPI description from it.  
> > > > 
> > > > Can you confirm that the  'mem' and 'memdev' parameters to -numa
> > > > are 100% live migration compatible in both directions ?  Libvirt
> > > > would need this to be the case in order to use the 'memdev' syntax
> > > > instead.
> > > Unfortunately they are not migration compatible in any direction,
> > > if it where possible to translate them to each other I'd alias 'mem'
> > > to 'memdev' without deprecation. The former sends over only one
> > > MemoryRegion to target, while the later sends over several (one per
> > > memdev).
> > > 
> > > Mixed memory issue[1] first came from libvirt side RHBZ1624223,
> > > back then it was resolved on libvirt side in favor of migration
> > > compatibility vs correctness (i.e. bind policy doesn't work as expected).
> > > What worse that it was made default and affects all new machines,
> > > as I understood it.
> > > 
> > > In case of -mem-path + -mem-prealloc (with 1 numa node or numa less)
> > > it's possible on QEMU side to make conversion to memdev in migration
> > > compatible way (that's what stopped Michal from memdev approach).
> > > But it's hard to do so in multi-nodes case as amount of MemoryRegions
> > > is different.
> > > 
> > > Point is to consider 'mem' as mis-configuration error, as the user
> > > in the first place using broken numa configuration
> > > (i.e. fake numa configuration doesn't actually improve performance).
> > > 
> > > CCed David, maybe he could offer a way to do 1:n migration and other
> > > way around.
> > 
> > I can't see a trivial way.
> > About the easiest I can think of is if you had a way to create a memdev
> > that was an alias to pc.ram (of a particular size and offset).  
> If I get you right that's what I was planning to do for numa-less machines
> that use -mem-path/prealloc options, where it's possible to replace
> an initial RAM MemoryRegion with a correspondingly named memdev and its
> backing MemoryRegion.

> But I don't see how it could work in case of legacy NUMA 'mem' options
> where initial RAM is 1 MemoryRegion (it's a fake numa after all) and how to
> translate that into several MemoryRegions (one per node/memdev).
Limiting it to x86 for demo purposes.
What would work (if*) is to create special MemoryRegion container, i.e.
  1. make memory_region_allocate_system_memory():memory_region_init()
 that special which already has id pc.ram and size that matches
 the single RAMBlock with the same id in incoming migration stream
 from OLD qemu ( started with -numa node,mem=x ... options)
  2. register "1" with vmstate_register_ram_global()/or other API
 which undercover will make migration code, split the single incoming
 RAM block into several smaller consecutive RAMBlocks  represented
 by memdev backends that are mapped as subregions within container 'pc.ram'
  3. in case of backward migration container MemoryRegion 'pc.ram' will serve
 other way around stitching back memdev subregions into the single
 'pc.

Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option

2019-03-14 Thread Igor Mammedov

On Sun, 10 Mar 2019 11:16:33 +0100
Markus Armbruster  wrote:

> Daniel P. Berrangé  writes:
> 
> > On Wed, Mar 06, 2019 at 08:03:48PM +0100, Igor Mammedov wrote:  
> >> On Mon, 4 Mar 2019 16:35:16 +
> >> Daniel P. Berrangé  wrote:
> >>   
> >> > On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote:  
> >> > > We couldn't have done that. How we would migrate from older qemu?
> >> > > 
> >> > > Anyway, now that I look into this (esp. git log) I came accross:
> >> > > 
> >> > > commit f309db1f4d51009bad0d32e12efc75530b66836b
> >> > > Author: Michal Privoznik 
> >> > > AuthorDate: Thu Dec 18 12:36:48 2014 +0100
> >> > > Commit: Michal Privoznik 
> >> > > CommitDate: Fri Dec 19 07:44:44 2014 +0100
> >> > > 
> >> > > qemu: Create memory-backend-{ram,file} iff needed
> >> > > 
> >> > > Or this 7832fac84741d65e851dbdbfaf474785cbfdcf3c. We did try to 
> >> > > generated
> >> > > newer cmd line but then for various reasong (e.g. avoiding triggering 
> >> > > a qemu
> >> > > bug) we turned it off and make libvirt default to older (now 
> >> > > deprecated) cmd
> >> > > line.
> >> > > 
> >> > > Frankly, I don't know how to proceed. Unless qemu is fixed to allow
> >> > > migration from deprecated to new cmd line (unlikely, if not impossible,
> >> > > right?) then I guess the only approach we can have is that:
> >> > > 
> >> > > 1) whenever so called cold booting a new machine (fresh, brand new 
> >> > > start of
> >> > > a new domain) libvirt would default to modern cmd line,
> >> > > 
> >> > > 2) on migration, libvirt would record in the migration stream (or 
> >> > > status XML
> >> > > or wherever) that modern cmd line was generated and thus it'll make the
> >> > > destination generate modern cmd line too.
> >> > > 
> >> > > This solution still suffers a couple of problems:
> >> > > a) migration to older libvirt will fail as older libvirt won't 
> >> > > recognize the
> >> > > flag set in 2) and therefore would default to deprecated cmd line
> >> > > b) migrating from one host to another won't modernize the cmd line
> >> > > 
> >> > > But I guess we have to draw a line somewhere (if we are not willing to 
> >> > > write
> >> > > those migration patches).  
> >> > 
> >> > Yeah supporting backwards migration is a non-optional requirement from at
> >> > least one of the mgmt apps using libvirt, so breaking the new to old case
> >> > is something we always aim to avoid.  
> >> Aiming for support of 
> >> "new QEMU + new machine type" => "old QEMU + non-existing machine type"
> >> seems a bit difficult.  
> >
> > That's not the scenario that's the problem. The problem is
> >
> >new QEMU + new machine type + new libvirt   -> new QEMU + new machine 
> > type + old libvirt
> >
> > Previously released versions of libvirt will happily use any new machine
> > type that QEMU introduces. So we can't make new libvirt use a different
> > options, only for new machine types, as old libvirt supports those machine
> > types too.  
> 
> Avoiding tight coupling between QEMU und libvirt versions makes sense,
> because having to upgrade stuff in lock-step is such a pain.
> 
> Does not imply we must support arbitrary combinations of QEMU and
> libvirt versions.
Isn't it typically a job of downstream to ship a bundle that works
together and it is rather limited set.
e.g 
  System 1 libvirt 0 QEMU 0 (machine 0.1 (latest)) could be migrated to 2 ways 
to
  System 2 libvirt 1 QEMU 1 (machine 0.1 (still the same old machine))
while installing QEMU 1 on System 1 might work (if it doesn't break due to 
dependencies)
and even be able to start machine 1.0, wouldn't it really fall in unsupported 
category?


> Unless upstream libvirt's test matrix covers all versions of libvirt
> against all released versions of QEMU, "previously released versions of
> libvirt will continue to work with new QEMU" is largely an empty promise
> anyway.  The real promise is more like "we won't break it intentionally;
> good luck".
> 
> Mind, I'm not criticizing that real promis

Re: [libvirt] [PATCH] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.

2019-03-07 Thread Igor Mammedov

On Thu, 7 Mar 2019 10:04:56 +
Daniel P. Berrangé  wrote:

> On Wed, Mar 06, 2019 at 07:48:22PM +0100, Igor Mammedov wrote:
> > On Wed, 6 Mar 2019 17:10:37 +
> > Daniel P. Berrangé  wrote:
> >   
> > > On Wed, Mar 06, 2019 at 05:58:35PM +0100, Igor Mammedov wrote:  
> > > > On Wed, 6 Mar 2019 16:39:38 +
> > > > Daniel P. Berrangé  wrote:
> > > >   
> > > > > On Wed, Mar 06, 2019 at 05:30:25PM +0100, Igor Mammedov wrote:  
> > > > > > Ammend -numa option docs and print warnings if 'mem' option or 
> > > > > > default RAM
> > > > > > splitting between nodes is used. It's intended to discourage users 
> > > > > > from using
> > > > > > configuration that allows only to fake NUMA on guest side while 
> > > > > > leading
> > > > > > to reduced performance of the guest due to inability to properly 
> > > > > > configure
> > > > > > VM's RAM on the host.
> > > > > > 
> > > > > > In NUMA case, it's recommended to always explicitly configure guest 
> > > > > > RAM
> > > > > > using -numa node,memdev={backend-id} option.
> > > > > > 
> > > > > > Signed-off-by: Igor Mammedov 
> > > > > > ---
> > > > > >  numa.c  |  5 +
> > > > > >  qemu-options.hx | 12 
> > > > > >  2 files changed, 13 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/numa.c b/numa.c
> > > > > > index 3875e1e..c6c2a6f 100644
> > > > > > --- a/numa.c
> > > > > > +++ b/numa.c
> > > > > > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, 
> > > > > > NumaNodeOptions *node,
> > > > > >  
> > > > > >  if (node->has_mem) {
> > > > > >  numa_info[nodenr].node_mem = node->mem;
> > > > > > +warn_report("Parameter -numa node,mem is obsolete,"
> > > > > > +" use -numa node,memdev instead");  
> > > > > 
> > > > > I don't think we should do this. Libvirt isn't going to stop using 
> > > > > this
> > > > > option in the near term. When users see warnings like this in logs  
> > > > well when it was the only option available libvirt had no other choice,
> > > > but since memdev became available libvirt should try to use it whenever
> > > > possible.  
> > > 
> > > As we previously discussed, it is not possible for libvirt to use it
> > > in all cases.
> > >   
> > > >   
> > > > > they'll often file bugs reports thinking something is broken which is
> > > > > not the case here.   
> > > > It's the exact purpose of the warning, to force user asking questions
> > > > and fix configuration, since he/she obviously not getting NUMA benefits
> > > > and/or performance-wise  
> > > 
> > > That's only useful if it is possible to do something about the problem.
> > > Libvirt wants to use the new option but it can't due to the live migration
> > > problems. So this simply leads to bug reports that will end up marked
> > > as CANTFIX.  
> > The problem could be solved by user though, by reconfiguring and restarting
> > domain since it's impossible to (at least as it stands now wrt migration).
> >   
> > > I don't believe libvirt actually  suffers from the performance problem
> > > you describe wrt lack of pinning.   When we attempt to pin guest NUMA
> > > nodes to host NUMA nodes, libvirt *will* use "memdev". IIUC, we
> > > use "mem" in the case where there /no/ requested pinning of guest
> > > NUMA nodes, and so we're not suffering from the limitations of "mem"
> > > in that case.  
> > What would be the use-case for not pinning numa nodes?
> > If user isn't asking for pinning, VM would run with degraded performance and
> > it would be better of being non-numa.  
> 
> The guest could have been originally booted on a host which has 2 NUMA
> nodes and have been migrated to a host with 1 NUMA node, in which case
> pinnning is not relevant.
> 
> For CI purposes too it is reasonable to create guests with NUMA configurations
> that bear no resemblance to the

Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option

2019-03-06 Thread Igor Mammedov

On Mon, 4 Mar 2019 17:20:13 +0100
Michal Privoznik  wrote:

> On 3/4/19 3:24 PM, Daniel P. Berrangé wrote:
> > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
> >> On Mon, 4 Mar 2019 12:39:08 +
> >> Daniel P. Berrangé  wrote:
> >>
> >>> On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> >>>> On Mon, 04 Mar 2019 08:13:53 +0100
> >>>> Markus Armbruster  wrote:
> >>>>
> >>>>> Daniel P. Berrangé  writes:
> >>>>>
> >>>>>> On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
> >>>>>>> On Fri, 1 Mar 2019 15:49:47 +
> >>>>>>> Daniel P. Berrangé  wrote:
> >>>>>>>  
> >>>>>>>> On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov wrote:
> >>>>>>>>> The parameter allows to configure fake NUMA topology where guest
> >>>>>>>>> VM simulates NUMA topology but not actually getting a performance
> >>>>>>>>> benefits from it. The same or better results could be achieved
> >>>>>>>>> using 'memdev' parameter. In light of that any VM that uses NUMA
> >>>>>>>>> to get its benefits should use 'memdev' and to allow transition
> >>>>>>>>> initial RAM to device based model, deprecate 'mem' parameter as
> >>>>>>>>> its ad-hoc partitioning of initial RAM MemoryRegion can't be
> >>>>>>>>> translated to memdev based backend transparently to users and in
> >>>>>>>>> compatible manner (migration wise).
> >>>>>>>>>
> >>>>>>>>> That will also allow to clean up a bit our numa code, leaving only
> >>>>>>>>> 'memdev' impl. in place and several boards that use node_mem
> >>>>>>>>> to generate FDT/ACPI description from it.
> >>>>>>>>
> >>>>>>>> Can you confirm that the  'mem' and 'memdev' parameters to -numa
> >>>>>>>> are 100% live migration compatible in both directions ?  Libvirt
> >>>>>>>> would need this to be the case in order to use the 'memdev' syntax
> >>>>>>>> instead.
> >>>>>>> Unfortunately they are not migration compatible in any direction,
> >>>>>>> if it where possible to translate them to each other I'd alias 'mem'
> >>>>>>> to 'memdev' without deprecation. The former sends over only one
> >>>>>>> MemoryRegion to target, while the later sends over several (one per
> >>>>>>> memdev).
> >>>>>>
> >>>>>> If we can't migration from one to the other, then we can not deprecate
> >>>>>> the existing 'mem' syntax. Even if libvirt were to provide a config
> >>>>>> option to let apps opt-in to the new syntax, we need to be able to
> >>>>>> support live migration of existing running VMs indefinitely. 
> >>>>>> Effectively
> >>>>>> this means we need the to keep 'mem' support forever, or at least such
> >>>>>> a long time that it effectively means forever.
> >>>>>>
> >>>>>> So I think this patch has to be dropped & replaced with one that
> >>>>>> simply documents that memdev syntax is preferred.
> >>>>>
> >>>>> We have this habit of postulating absolutes like "can not deprecate"
> >>>>> instead of engaging with the tradeoffs.  We need to kick it.
> >>>>>
> >>>>> So let's have an actual look at the tradeoffs.
> >>>>>
> >>>>> We don't actually "support live migration of existing running VMs
> >>>>> indefinitely".
> >>>>>
> >>>>> We support live migration to any newer version of QEMU that still
> >>>>> supports the machine type.
> >>>>>
> >>>>> We support live migration to any older version of QEMU that already
> >>>>> supports the machine type and all the devices the machine uses.
> >>>>>
> >>>>> Aside: "support" is really an h

Re: [libvirt] [Qemu-devel] [PATCH 1/2] numa: deprecate 'mem' parameter of '-numa node' option

2019-03-06 Thread Igor Mammedov

On Mon, 4 Mar 2019 16:35:16 +
Daniel P. Berrangé  wrote:

> On Mon, Mar 04, 2019 at 05:20:13PM +0100, Michal Privoznik wrote:
> > On 3/4/19 3:24 PM, Daniel P. Berrangé wrote:
> > > On Mon, Mar 04, 2019 at 03:16:41PM +0100, Igor Mammedov wrote:
> > > > On Mon, 4 Mar 2019 12:39:08 +
> > > > Daniel P. Berrangé  wrote:
> > > > 
> > > > > On Mon, Mar 04, 2019 at 01:25:07PM +0100, Igor Mammedov wrote:
> > > > > > On Mon, 04 Mar 2019 08:13:53 +0100
> > > > > > Markus Armbruster  wrote:
> > > > > > > Daniel P. Berrangé  writes:
> > > > > > > > On Fri, Mar 01, 2019 at 06:33:28PM +0100, Igor Mammedov wrote:
> > > > > > > > > On Fri, 1 Mar 2019 15:49:47 +
> > > > > > > > > Daniel P. Berrangé  wrote:
> > > > > > > > > > On Fri, Mar 01, 2019 at 04:42:15PM +0100, Igor Mammedov 
> > > > > > > > > > wrote:
> > > > > > > > > > > The parameter allows to configure fake NUMA topology 
> > > > > > > > > > > where guest
> > > > > > > > > > > VM simulates NUMA topology but not actually getting a 
> > > > > > > > > > > performance
> > > > > > > > > > > benefits from it. The same or better results could be 
> > > > > > > > > > > achieved
> > > > > > > > > > > using 'memdev' parameter. In light of that any VM that 
> > > > > > > > > > > uses NUMA
> > > > > > > > > > > to get its benefits should use 'memdev' and to allow 
> > > > > > > > > > > transition
> > > > > > > > > > > initial RAM to device based model, deprecate 'mem' 
> > > > > > > > > > > parameter as
> > > > > > > > > > > its ad-hoc partitioning of initial RAM MemoryRegion can't 
> > > > > > > > > > > be
> > > > > > > > > > > translated to memdev based backend transparently to users 
> > > > > > > > > > > and in
> > > > > > > > > > > compatible manner (migration wise).
> > > > > > > > > > > 
> > > > > > > > > > > That will also allow to clean up a bit our numa code, 
> > > > > > > > > > > leaving only
> > > > > > > > > > > 'memdev' impl. in place and several boards that use 
> > > > > > > > > > > node_mem
> > > > > > > > > > > to generate FDT/ACPI description from it.
> > > > > > > > > > 
> > > > > > > > > > Can you confirm that the  'mem' and 'memdev' parameters to 
> > > > > > > > > > -numa
> > > > > > > > > > are 100% live migration compatible in both directions ?  
> > > > > > > > > > Libvirt
> > > > > > > > > > would need this to be the case in order to use the 'memdev' 
> > > > > > > > > > syntax
> > > > > > > > > > instead.
> > > > > > > > > Unfortunately they are not migration compatible in any 
> > > > > > > > > direction,
> > > > > > > > > if it where possible to translate them to each other I'd 
> > > > > > > > > alias 'mem'
> > > > > > > > > to 'memdev' without deprecation. The former sends over only 
> > > > > > > > > one
> > > > > > > > > MemoryRegion to target, while the later sends over several 
> > > > > > > > > (one per
> > > > > > > > > memdev).
> > > > > > > > 
> > > > > > > > If we can't migration from one to the other, then we can not 
> > > > > > > > deprecate
> > > > > > > > the existing 'mem' syntax. Even if libvirt were to provide a 
> > > > > > > > config
> > > > > > > > option to let apps opt-in to the new syntax, we need to be able 
> > > > > > > > to
> > >

Re: [libvirt] [PATCH v2] numa: warn if numa 'mem' option or default RAM splitting between nodes is used.

2019-03-06 Thread Igor Mammedov

On Wed, 6 Mar 2019 18:16:08 +
Daniel P. Berrangé  wrote:

> On Wed, Mar 06, 2019 at 06:33:25PM +0100, Igor Mammedov wrote:
> > Amend -numa option docs and print warnings if 'mem' option or default RAM
> > splitting between nodes is used. It's intended to discourage users from 
> > using
> > configuration that allows only to fake NUMA on guest side while leading
> > to reduced performance of the guest due to inability to properly configure
> > VM's RAM on the host.
> > 
> > In NUMA case, it's recommended to always explicitly configure guest RAM
> > using -numa node,memdev={backend-id} option.
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> >  numa.c  |  5 +
> >  qemu-options.hx | 12 
> >  2 files changed, 13 insertions(+), 4 deletions(-)
> > 
> > diff --git a/numa.c b/numa.c
> > index 3875e1e..42838f9 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -121,6 +121,8 @@ static void parse_numa_node(MachineState *ms, 
> > NumaNodeOptions *node,
> >  
> >  if (node->has_mem) {
> >  numa_info[nodenr].node_mem = node->mem;
> > +warn_report("Parameter -numa node,mem is obsolete,"
> > +" use -numa node,memdev instead");
> 
> My comments from v1 still apply. We must not do this as long as
> libvirt has no choice but to continue using this feature.
It has a choice to use 'memdev' whenever creating a new VM and continue using 
'mem' with exiting VMs.

> 
> >  }
> >  if (node->has_memdev) {
> >  Object *o;
> > @@ -407,6 +409,9 @@ void numa_complete_configuration(MachineState *ms)
> >  if (i == nb_numa_nodes) {
> >  assert(mc->numa_auto_assign_ram);
> >  mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, 
> > ram_size);
> > +warn_report("Default splitting of RAM between nodes is 
> > obsolete,"
> > +" Use '-numa node,memdev' to explicitly define RAM"
> > +" allocation per node");
> >  }
> >  
> >  numa_total = 0;
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 1cf9aac..61035cb 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -206,10 +206,14 @@ For example:
> >  -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
> >  @end example
> >  
> > -@samp{mem} assigns a given RAM amount to a node. @samp{memdev}
> > -assigns RAM from a given memory backend device to a node. If
> > -@samp{mem} and @samp{memdev} are omitted in all nodes, RAM is
> > -split equally between them.
> > +@samp{memdev} assigns RAM from a given memory backend device to a node.
> > +
> > +Legacy options/behaviour: @samp{mem} assigns a given RAM amount to a node.
> > +If @samp{mem} and @samp{memdev} are omitted in all nodes, RAM is split 
> > equally
> > +between them. Option @samp{mem} and default RAM splitting are obsolete as 
> > they
> > +do not provide means to manage RAM on the host side and only allow QEMU to 
> > fake
> > +NUMA support which in practice could degrade VM performance.
> > +It's advised to always explicitly configure NUMA RAM by using the 
> > @samp{memdev} option.
> >  
> >  @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
> >  if one node uses @samp{memdev}, all of them have to use it.
> > -- 
> > 2.7.4
> > 
> 
> Regards,
> Daniel


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

1 2 >

1 - 100 of 199 matches

Mail list logo