Re: [PATCH v2] qemu-options: Deprecate "-runas" and introduce "-run-with user=..." instead

2024-05-08 Thread Claudio Fontana
On 5/6/24 13:20, Thomas Huth wrote:
> The old "-runas" option has the disadvantage that it is not visible
> in the QAPI schema, so it is not available via the normal introspection
> mechanisms. We've recently introduced the "-run-with" option for exactly
> this purpose, which is meant to handle the options that affect the
> runtime behavior. Thus let's introduce a "user=..." parameter here now
> and deprecate the old "-runas" option.
> 
> Signed-off-by: Thomas Huth 
> ---
>  v2: Add missing part in qemu-options.hx as suggested by Philippe
> 
>  docs/about/deprecated.rst |  6 ++
>  system/vl.c   | 15 +++
>  qemu-options.hx   | 15 +++
>  3 files changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 3310df3274..fe69e2d44c 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -61,6 +61,12 @@ configurations (e.g. -smp drawers=1,books=1,clusters=1 for 
> x86 PC machine) is
>  marked deprecated since 9.0, users have to ensure that all the topology 
> members
>  described with -smp are supported by the target machine.
>  
> +``-runas`` (since 9.1)
> +--
> +
> +Use ``-run-with user=..`` instead.
> +
> +
>  User-mode emulator command line arguments
>  -
>  
> diff --git a/system/vl.c b/system/vl.c
> index 7756eac81e..b031427440 100644
> --- a/system/vl.c
> +++ b/system/vl.c
> @@ -773,6 +773,10 @@ static QemuOptsList qemu_run_with_opts = {
>  .name = "chroot",
>  .type = QEMU_OPT_STRING,
>  },
> +{
> +.name = "user",
> +.type = QEMU_OPT_STRING,
> +},
>  { /* end of list */ }
>  },
>  };
> @@ -3586,6 +3590,7 @@ void qemu_init(int argc, char **argv)
>  break;
>  #if defined(CONFIG_POSIX)
>  case QEMU_OPTION_runas:
> +warn_report("-runas is deprecated, use '-run-with user=...' 
> instead");
>  if (!os_set_runas(optarg)) {
>  error_report("User \"%s\" doesn't exist"
>   " (and is not :)",
> @@ -3612,6 +3617,16 @@ void qemu_init(int argc, char **argv)
>  if (str) {
>  os_set_chroot(str);
>  }
> +str = qemu_opt_get(opts, "user");
> +if (str) {
> +if (!os_set_runas(str)) {
> +error_report("User \"%s\" doesn't exist"
> + " (and is not :)",
> + optarg);
> +exit(1);
> +}
> +}
> +
>  break;
>  }
>  #endif /* CONFIG_POSIX */
> diff --git a/qemu-options.hx b/qemu-options.hx
> index cf61f6b863..3031479a15 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -4824,7 +4824,8 @@ DEF("runas", HAS_ARG, QEMU_OPTION_runas, \
>  SRST
>  ``-runas user``
>  Immediately before starting guest execution, drop root privileges,
> -switching to the specified user.
> +switching to the specified user. This option is deprecated, use
> +``-run-with user=...`` instead.
>  ERST
>  
>  DEF("prom-env", HAS_ARG, QEMU_OPTION_prom_env,
> @@ -4990,13 +4991,15 @@ DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "", 
> QEMU_ARCH_ALL)
>  
>  #ifdef CONFIG_POSIX
>  DEF("run-with", HAS_ARG, QEMU_OPTION_run_with,
> -"-run-with [async-teardown=on|off][,chroot=dir]\n"
> +"-run-with [async-teardown=on|off][,chroot=dir][user=username|uid:gid]\n"
>  "Set miscellaneous QEMU process lifecycle options:\n"
>  "async-teardown=on enables asynchronous teardown (Linux 
> only)\n"
> -"chroot=dir chroot to dir just before starting the VM\n",
> +"chroot=dir chroot to dir just before starting the VM\n"
> +"user=username switch to the specified user before 
> starting the VM\n"
> +"user=uid:gid dito, but use specified user-ID and 
> group-ID instead\n",

nit: ditto?

Ciao,

C

>  QEMU_ARCH_ALL)
>  SRST
> -``-run-with [async-teardown=on|off][,chroot=dir]``
> +``-run-with [async-teardown=on|off][,chroot=dir][user=username|uid:gid]``
>  Set QEMU process lifecycle options.
>  
>  ``async-teardown=on`` enables asynchronous teardown. A new process called
> @@ -5013,6 +5016,10 @@ SRST
>  ``chroot=dir`` can be used for doing a chroot to the specified directory
>  immediately before starting the guest execution. This is especially 
> useful
>  in combination with -runas.
> +
> +``user=username`` or ``user=uid:gid`` can be used to drop root privileges
> +by switching to the specified user (via username) or user and group
> +(via uid:gid) immediately before starting guest execution.
>  ERST
>  #endif
>  

Re: Revisiting parallel save/restore

2024-05-02 Thread Claudio Fontana
On 4/26/24 16:50, Daniel P. Berrangé wrote:
> On Fri, Apr 26, 2024 at 11:44:38AM -0300, Fabiano Rosas wrote:
>> Daniel P. Berrangé  writes:
>>
>>> On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote:
 Daniel P. Berrangé  writes:

> On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
>> A good starting point on this journey is supporting the new mapped-ram
>> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
>> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm 
>> not
>> sure how to detect if a saved image is in mapped-ram format vs the 
>> existing,
>> sequential stream format.
>
> Yes, we'll need to be supporting 'mapped-ram', so a good first step.
>
> A question is whether we make that feature mandatory for all save images,
> or implied by another feature (parallel save), or an directly controllable
> feature with opt-in.
>
> The former breaks back compat with existnig libvirt, while the latter 2
> options are net new so don't have compat implications.
>
> In terms of actual data blocks written on disk mapped-ram should be be the
> same size, or smaller, than the existing format.
>
> In terms of logical file size, however, mapped-ram will almost always be
> larger.
>
> This is because mapped-ram will result in a file whose logical size 
> matches
> the guest RAM size, plus some header overhead, while being sparse so not
> all blocks are written.
>
> If tools handling save images aren't sparse-aware this could come across
> as a surprise and even be considered a regression.
>
> Mapped ram is needed for parallel saves since it lets each thread write
> to a specific region of the file.
>
> Mapped ram is good for non-parallel saves too though, because the mapping
> of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
> Currently libvirt has to tunnnel over its iohelper to futz alignment
> needed for O_DIRECT. This makes it desirable to use in general, but back
> compat hurts...

 Note that QEMU doesn't support O_DIRECT without multifd.

 From mapped-ram patch series v4:

 - Dropped support for direct-io with fixed-ram _without_ multifd. This
   is something I said I would do for this version, but I had to drop
   it because performance is really bad. I think the single-threaded
   precopy code cannot cope with the extra latency/synchronicity of
   O_DIRECT.
>>>
>>> Note the reason for using O_DIRECT is *not* to make saving / restoring
>>> the guest VM faster. Rather it is to ensure that saving/restoring a VM
>>> does not trash the host I/O / buffer cache, which will negatively impact
>>> performance of all the *other* concurrently running VMs.

You can absolutely also thrash yourself, not only other VMs.

>>
>> Well, there's surely a performance degradation threshold that negates
>> the benefits of perserving the caches. But maybe it's not as low as I
>> initially thought then.
> 
> I guess you could say that O_DIRECT makes saving/restoring have a
> predictable speed, because it will no longer randomly vary depending
> on how much free RAM happens to be available at a given time. Time
> will be dominated largely by the underlying storage I/O performanc
With fast nvme disks, my observation is that O_DIRECT without multifd is 
bottlenecked on libvirt + QEMU throughput, not on storage I/O Performance.

> 
> With regards,
> Daniel

Ciao,

Claudio
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: [PATCH] virt-admin: Fix segfault when libvirtd dies

2024-03-19 Thread Claudio Fontana
On 3/19/24 15:58, Jiri Denemark wrote:
> On Tue, Mar 19, 2024 at 14:34:08 +0100, Claudio Fontana wrote:
>> On 3/19/24 12:02, Adam Julis wrote:
>>> vshAdmCatchDisconnect requires non-NULL structure vshControl for 
>>> getting connection name (stored at opaque), but 
>>> virAdmConnectRegisterCloseCallback at vshAdmConnect called it 
>>> with NULL.
>>> 
>>> Signed-off-by: Adam Julis  --- 
>>> tools/virt-admin.c | 2 +- 1 file changed, 1 insertion(+), 1
>>> deletion(-)
>>> 
>>> diff --git a/tools/virt-admin.c b/tools/virt-admin.c index
>>> 37bc6fe4f0..0766032e4a 100644 --- a/tools/virt-admin.c +++
>>> b/tools/virt-admin.c @@ -112,7 +112,7 @@ vshAdmConnect(vshControl
>>> *ctl, unsigned int flags) return -1; } else { if
>>> (virAdmConnectRegisterCloseCallback(priv->conn,
>>> vshAdmCatchDisconnect, -
>>> NULL, NULL) < 0) +
>>> ctl, NULL) < 0) vshError(ctl, "%s", _("Unable to register
>>> disconnect callback"));
>>> 
>>> if (priv->wantReconnect)
>> 
>> Hi,
>> 
>> if that is the case I would then expect that
>> virAdmConnectRegisterCloseCallback() needs to also be updated
>> with:
>> 
>> +virCheckNonNullArgGoto(opaque, error);
>> 
>> or something like that. Is there a reason why it isn't? We better
>> catch the error early if the API is used wrongly.
> 
> Well, vshAdmCatchDisconnect requires opaque to be non-NULL, but
> other callbacks registered with virAdmConnectRegisterCloseCallback
> may not need any opaque data.
> 

Fair enough.

Reviewed-by: Claudio Fontana 
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: [PATCH] virt-admin: Fix segfault when libvirtd dies

2024-03-19 Thread Claudio Fontana
On 3/19/24 12:02, Adam Julis wrote:
> vshAdmCatchDisconnect requires non-NULL structure vshControl for
> getting connection name (stored at opaque), but
> virAdmConnectRegisterCloseCallback at vshAdmConnect called it
> with NULL.
> 
> Signed-off-by: Adam Julis 
> ---
>  tools/virt-admin.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/virt-admin.c b/tools/virt-admin.c
> index 37bc6fe4f0..0766032e4a 100644
> --- a/tools/virt-admin.c
> +++ b/tools/virt-admin.c
> @@ -112,7 +112,7 @@ vshAdmConnect(vshControl *ctl, unsigned int flags)
>  return -1;
>  } else {
>  if (virAdmConnectRegisterCloseCallback(priv->conn, 
> vshAdmCatchDisconnect,
> -   NULL, NULL) < 0)
> +   ctl, NULL) < 0)
>  vshError(ctl, "%s", _("Unable to register disconnect callback"));
>  
>  if (priv->wantReconnect)

Hi,

if that is the case I would then expect that 
virAdmConnectRegisterCloseCallback() needs to also be updated with:

+virCheckNonNullArgGoto(opaque, error);

or something like that. Is there a reason why it isn't? We better catch the 
error early if the API is used wrongly.

Claudio
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: [libvirt RFCv11 00/33] multifd save restore prototype

2023-11-27 Thread Claudio Fontana
On 11/27/23 11:50, Daniel P. Berrangé wrote:
> On Mon, Nov 27, 2023 at 11:40:29AM +0100, Claudio Fontana wrote:
>> On 11/27/23 11:18, Daniel P. Berrangé wrote:
>>> On Mon, Nov 27, 2023 at 10:43:58AM +0100, Claudio Fontana wrote:
>>>> Hi all,
>>>>
>>>> I understand there has been some movement in this topic as the 
>>>> fixed-offset ram and multifd code evolves.
>>>>
>>>> I think I understood that now the idea is to pass from libvirt to QEMU two 
>>>> file descriptors,
>>>> one for writing metadata,
>>>> and a separate one for the actual memory pages, which is the one that can 
>>>> potentially be O_DIRECT.
>>>
>>> We determined that O_DIRECT changes propagate across dup'd file
>>> descriptors, so we have only two choices
>>>
>>>   * 1 FD, and QEMU has to take care to toggle O_DIRECT on/off
>>> repeatedly depending on what phase it is in
>>>   * 2 FDs, one with and one without O_DIRECT
>>>
>>> Either is viable for libvirt. I have a mild preference for having
>>> 1 FD, but not enough to call it a design blocker. So at the
>>> discretion of whomever implements the QEMU part.
>>>
>>>> 1) I would assume that libvirt would then need to check if the user 
>>>> requested --parallel / --parallel-connections to enable QEMU multifd.
>>>
>>> Yes
>>>
>>>> 2) I would also assume that libvirt would check the presence of 
>>>> --bypass-cache as the condition to set O_DIRECT on the second (memory 
>>>> pages fd), and to enable QEMU "io-direct" feature.
>>>
>>> Yes
>>>
>>>> 3) I would tentatively suggest that when it comes to fixed-ram-offset, the 
>>>> condition to enable that one is a check like the one currently in libvirt:
>>>>
>>>> src/util/virfile.c::virFileDiskCopy()
>>>>
>>>> ie checking that we are writing to a seekable file that is not S_ISBLK .
>>>>
>>>> Does this match your understanding/reasoning?
>>>
>>> Both the io-direct and fixed-ram-offset features are dependent on
>>> new QEMU impls, so there is a mild backwards compatibility concern.
>>>
>>> ie, lets say if we are running QEMU 9.0.0, but with old machine
>>> type pc-i440fx-8.0.0, and we save the state, but want to then
>>> restore with QEMU 8.2.0.
>>>
>>> Essentially we must NOT use io-direct/fixed-ram-offset if we
>>> want the ability to migrate to older QEMU. At the same time I
>>> would like us to be able to take advantage of this new QEMU
>>> support to the greatest extent possible, even if not doing the
>>> --parallel stuff which was the original motivator.
>>>
>>> Thus, we need some way to decide whether to use the new or the
>>> old on disk format.
>>>
>>> I wonder if having a setting in '/etc/libvirt/qemu.conf' is
>>> sufficient, or whether we must expose a flag via the API.
>>>
>>> With regards,
>>> Daniel
>>
>> Thanks Daniel,
>>
>> that's an interesting point. The new fixed-ram-offset format is a QEMU 
>> format, and as such I presume that in theory there is a
>>
>> qemu_saveimage.h:#define QEMU_SAVE_VERSION 2
>>
>> that could be bumped to 3 if this new format is used?
>>
>> But then again, libvirt would need to decide whether to save in "old QEMU 
>> compatibility mode" or in the new QEMU_SAVE_VERSION 3 mode that allows for 
>> fixed-ram-offset.
>>
>> Maybe a new libvirt option for controlling which QEMU_SAVE_VERSION format to 
>> use for the save, with the default being v2 for backward compatibility 
>> reasons?
> 
> What makes me reluctant to add public API, is that generally I feel that
> things like file format versions are internal/private impl details that
> mgmt apps should not need to care about.
> 
> Even the compatibility problem should only be a short term issue, since
> there's only limited version combinations that end up getting used in
> practice, and so called "backwards" version compat usage is even rarer.
> 
> I guess if we stick with a qemu.conf setting initially, we can revisit
> later if we find an API control knob is requird.
> 
> With regards,
> Daniel

I think qemu.conf also works; I'd still default to v2 for a while,
we can document that in order to get these additional features setting v3 in 
qemu.conf is required to save in the new format.

Thanks,

Claudio
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: [libvirt RFCv11 00/33] multifd save restore prototype

2023-11-27 Thread Claudio Fontana
On 11/27/23 11:18, Daniel P. Berrangé wrote:
> On Mon, Nov 27, 2023 at 10:43:58AM +0100, Claudio Fontana wrote:
>> Hi all,
>>
>> I understand there has been some movement in this topic as the fixed-offset 
>> ram and multifd code evolves.
>>
>> I think I understood that now the idea is to pass from libvirt to QEMU two 
>> file descriptors,
>> one for writing metadata,
>> and a separate one for the actual memory pages, which is the one that can 
>> potentially be O_DIRECT.
> 
> We determined that O_DIRECT changes propagate across dup'd file
> descriptors, so we have only two choices
> 
>   * 1 FD, and QEMU has to take care to toggle O_DIRECT on/off
> repeatedly depending on what phase it is in
>   * 2 FDs, one with and one without O_DIRECT
> 
> Either is viable for libvirt. I have a mild preference for having
> 1 FD, but not enough to call it a design blocker. So at the
> discretion of whomever implements the QEMU part.
> 
>> 1) I would assume that libvirt would then need to check if the user 
>> requested --parallel / --parallel-connections to enable QEMU multifd.
> 
> Yes
> 
>> 2) I would also assume that libvirt would check the presence of 
>> --bypass-cache as the condition to set O_DIRECT on the second (memory pages 
>> fd), and to enable QEMU "io-direct" feature.
> 
> Yes
> 
>> 3) I would tentatively suggest that when it comes to fixed-ram-offset, the 
>> condition to enable that one is a check like the one currently in libvirt:
>>
>> src/util/virfile.c::virFileDiskCopy()
>>
>> ie checking that we are writing to a seekable file that is not S_ISBLK .
>>
>> Does this match your understanding/reasoning?
> 
> Both the io-direct and fixed-ram-offset features are dependent on
> new QEMU impls, so there is a mild backwards compatibility concern.
> 
> ie, lets say if we are running QEMU 9.0.0, but with old machine
> type pc-i440fx-8.0.0, and we save the state, but want to then
> restore with QEMU 8.2.0.
> 
> Essentially we must NOT use io-direct/fixed-ram-offset if we
> want the ability to migrate to older QEMU. At the same time I
> would like us to be able to take advantage of this new QEMU
> support to the greatest extent possible, even if not doing the
> --parallel stuff which was the original motivator.
> 
> Thus, we need some way to decide whether to use the new or the
> old on disk format.
> 
> I wonder if having a setting in '/etc/libvirt/qemu.conf' is
> sufficient, or whether we must expose a flag via the API.
> 
> With regards,
> Daniel

Thanks Daniel,

that's an interesting point. The new fixed-ram-offset format is a QEMU format, 
and as such I presume that in theory there is a

qemu_saveimage.h:#define QEMU_SAVE_VERSION 2

that could be bumped to 3 if this new format is used?

But then again, libvirt would need to decide whether to save in "old QEMU 
compatibility mode" or in the new QEMU_SAVE_VERSION 3 mode that allows for 
fixed-ram-offset.

Maybe a new libvirt option for controlling which QEMU_SAVE_VERSION format to 
use for the save, with the default being v2 for backward compatibility reasons?

Thanks,

Claudio

___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org