Re: [PATCH 1/4] vhost: Re-enable vrings after setting features

2023-04-13 Thread Anton Kuchin

On 13/04/2023 14:03, Stefan Hajnoczi wrote:

On Thu, 13 Apr 2023 at 04:20, Hanna Czenczek  wrote:

On 12.04.23 22:51, Stefan Hajnoczi wrote:

On Tue, Apr 11, 2023 at 05:05:12PM +0200, Hanna Czenczek wrote:

If the back-end supports the VHOST_USER_F_PROTOCOL_FEATURES feature,
setting the vhost features will set this feature, too.  Doing so
disables all vrings, which may not be intended.

For example, enabling or disabling logging during migration requires
setting those features (to set or unset VHOST_F_LOG_ALL), which will
automatically disable all vrings.  In either case, the VM is running
(disabling logging is done after a failed or cancelled migration, and
only once the VM is running again, see comment in
memory_global_dirty_log_stop()), so the vrings should really be enabled.
As a result, the back-end seems to hang.

To fix this, we must remember whether the vrings are supposed to be
enabled, and, if so, re-enable them after a SET_FEATURES call that set
VHOST_USER_F_PROTOCOL_FEATURES.

It seems less than ideal that there is a short period in which the VM is
running but the vrings will be stopped (between SET_FEATURES and
SET_VRING_ENABLE).  To fix this, we would need to change the protocol,
e.g. by introducing a new flag or vhost-user protocol feature to disable
disabling vrings whenever VHOST_USER_F_PROTOCOL_FEATURES is set, or add
new functions for setting/clearing singular feature bits (so that
F_LOG_ALL can be set/cleared without touching F_PROTOCOL_FEATURES).

Even with such a potential addition to the protocol, we still need this
fix here, because we cannot expect that back-ends will implement this
addition.

Signed-off-by: Hanna Czenczek 
---
   include/hw/virtio/vhost.h | 10 ++
   hw/virtio/vhost.c | 13 +
   2 files changed, 23 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a52f273347..2fe02ed5d4 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -90,6 +90,16 @@ struct vhost_dev {
   int vq_index_end;
   /* if non-zero, minimum required value for max_queues */
   int num_queues;
+
+/*
+ * Whether the virtqueues are supposed to be enabled (via
+ * SET_VRING_ENABLE).  Setting the features (e.g. for
+ * enabling/disabling logging) will disable all virtqueues if
+ * VHOST_USER_F_PROTOCOL_FEATURES is set, so then we need to
+ * re-enable them if this field is set.
+ */
+bool enable_vqs;
+
   /**
* vhost feature handling requires matching the feature set
* offered by a backend which may be a subset of the total
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index a266396576..cbff589efa 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -50,6 +50,8 @@ static unsigned int used_memslots;
   static QLIST_HEAD(, vhost_dev) vhost_devices =
   QLIST_HEAD_INITIALIZER(vhost_devices);

+static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable);
+
   bool vhost_has_free_slot(void)
   {
   unsigned int slots_limit = ~0U;
@@ -899,6 +901,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
   }
   }

+if (dev->enable_vqs) {
+/*
+ * Setting VHOST_USER_F_PROTOCOL_FEATURES would have disabled all
+ * virtqueues, even if that was not intended; re-enable them if
+ * necessary.
+ */
+vhost_dev_set_vring_enable(dev, true);
+}
+
   out:
   return r;
   }
@@ -1896,6 +1907,8 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, 
uint16_t queue_size,

   static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)
   {
+hdev->enable_vqs = enable;
+
   if (!hdev->vhost_ops->vhost_set_vring_enable) {
   return 0;
   }

The vhost-user spec doesn't say that VHOST_F_LOG_ALL needs to be toggled
at runtime and I don't think VHOST_USER_SET_PROTOCOL_FEATURES is
intended to be used like that. This issue shows why doing so is a bad
idea.

VHOST_F_LOG_ALL does not need to be toggled to control logging. Logging
is controlled at runtime by the presence of the dirty log
(VHOST_USER_SET_LOG_BASE) and the per-vring logging flag
(VHOST_VRING_F_LOG).

Technically, the spec doesn’t say that SET_LOG_BASE is required.  It says:

“To start/stop logging of data/used ring writes, the front-end may send
messages VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and
VHOST_USER_SET_VRING_ADDR with VHOST_VRING_F_LOG in ring’s flags set to
1/0, respectively.”

(So the spec also very much does imply that toggling F_LOG_ALL at
runtime is a valid way to enable/disable logging.  If we were to no
longer do that, we should clarify it there.)

I missed that VHOST_VRING_F_LOG only controls logging used ring writes
while writes to descriptors are always logged when VHOST_F_LOG_ALL is
set. I agree that the spec does require VHOST_F_LOG_ALL to be toggled
at runtime.

What I suggested won't work.


But is there a valid use-case for logging some dirty memory but not all?
I

Re: [Virtio-fs] [RFC 2/2] vhost-user-fs: Implement stateful migration

2023-03-22 Thread Anton Kuchin

On 21/03/2023 19:49, Hanna Czenczek wrote:

On 20.03.23 13:39, Anton Kuchin wrote:

On 20/03/2023 11:33, Hanna Czenczek wrote:

On 17.03.23 19:37, Anton Kuchin wrote:

On 17/03/2023 19:52, Hanna Czenczek wrote:

On 17.03.23 18:19, Anton Kuchin wrote:

On 13/03/2023 19:48, Hanna Czenczek wrote:

A virtio-fs device's VM state consists of:
- the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
- the back-end's (virtiofsd's) internal state

We get/set the latter via the new vhost-user operations 
FS_SET_STATE_FD,

FS_GET_STATE, and FS_SET_STATE.



[...]


  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-    .unmigratable = 1,
+    .version_id = 1,
+    .fields = (VMStateField[]) {
+    VMSTATE_VIRTIO_DEVICE,
+    {
+    .name = "back-end",
+    .info = &(const VMStateInfo) {
+    .name = "virtio-fs back-end state",
+    .get = vuf_load_state,
+    .put = vuf_save_state,
+    },
+    },


I've been working on stateless migration patch [1] and there was 
discussed that we
need to keep some kind of blocker by default if orchestrators 
rely on unmigratable

field in virtio-fs vmstate to block the migration.
For this purpose I've implemented flag that selects "none" or 
"external" and is checked

in pre_save, so it could be extended with "internal" option.
We didn't come to conclusion if we also need to check incoming 
migration, the discussion

has stopped for a while but I'm going back to it now.

I would appreciate if you have time to take a look at the 
discussion and consider the idea
proposed there to store internal state as a subsection of vmstate 
to make it as an option

but not mandatory.

[1] 
https://patchew.org/QEMU/20230217170038.1273710-1-antonkuc...@yandex-team.ru/


So far I’ve mostly considered these issues orthogonal.  If your 
stateless migration goes in first, then state is optional and I’ll 
adjust this series.
If stateful migration goes in first, then your series can simply 
make state optional by introducing the external option, no?


Not really. State can be easily extended by subsections but not 
trimmed. Maybe this can be worked around by defining two types of 
vmstate and selecting the correct one at migration, but I'm not sure.


I thought your patch included a switch on the vhost-user-fs device 
(on the qemu side) to tell qemu what migration data to expect. Can 
we not transfer a 0-length state for 'external', and assert this on 
the destination side?


Looks like I really need to make the description of my patch and the 
documentation more clear :) My patch proposes switch on _source_ side 
to select which data to save in the stream mostly to protect old 
orchestrators that don't expect virtio-fs to be migratable (and for 
internal case it can be extended to select if qemu needs to request 
state from backend), Michael insists that we also need to check on 
destination but I disagree because I believe that we can figure this 
out from stream data without additional device flags.






But maybe we could also consider making stateless migration a 
special case of stateful migration; if we had stateful migration, 
can’t we just implement stateless migration by telling virtiofsd 
that it should submit a special “I have no state” pseudo-state, 
i.e. by having a switch on virtiofsd instead?


Sure. Backend can send empty state (as your patch treats 0 length 
as a valid response and not error) or dummy state that can be 
recognized as stateless. The only potential problem is that then we 
need support in backend for new commands even to return dummy 
state, and if backend can support both types then we'll need some 
switch in backend to reply with real or empty state.


Yes, exactly.



Off the top of my head, some downsides of that approach would be
(1) it’d need a switch on the virtiofsd side, not on the qemu side 
(not sure if that’s a downside, but a difference for sure),


Why would you? It seems to me that this affects only how qemu 
treats the vmstate of device. If the state was requested backend 
sends it to qemu. If state subsection is present in stream qemu 
sends it to the backend for loading. Stateless one just doesn't 
request state from the backend. Or am I missing something?


and (2) we’d need at least some support for this on the virtiofsd 
side, i.e. practically can’t come quicker than stateful migration 
support.


Not much, essentially this is just a reconnect. I've sent a draft 
of a reconnect patch for old C-virtiofsd, for rust version it takes 
much longer because I'm learning rust and I'm not really good at it 
yet.


I meant these two downsides not for your proposal, but instead if we 
implemented stateless migration only in the back-end; i.e. the 
front-end would only implement stateful migration, and the back-e

Re: [Virtio-fs] [RFC 2/2] vhost-user-fs: Implement stateful migration

2023-03-20 Thread Anton Kuchin

On 20/03/2023 11:33, Hanna Czenczek wrote:

On 17.03.23 19:37, Anton Kuchin wrote:

On 17/03/2023 19:52, Hanna Czenczek wrote:

On 17.03.23 18:19, Anton Kuchin wrote:

On 13/03/2023 19:48, Hanna Czenczek wrote:

A virtio-fs device's VM state consists of:
- the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
- the back-end's (virtiofsd's) internal state

We get/set the latter via the new vhost-user operations 
FS_SET_STATE_FD,

FS_GET_STATE, and FS_SET_STATE.



[...]


  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-    .unmigratable = 1,
+    .version_id = 1,
+    .fields = (VMStateField[]) {
+    VMSTATE_VIRTIO_DEVICE,
+    {
+    .name = "back-end",
+    .info = &(const VMStateInfo) {
+    .name = "virtio-fs back-end state",
+    .get = vuf_load_state,
+    .put = vuf_save_state,
+    },
+    },


I've been working on stateless migration patch [1] and there was 
discussed that we
need to keep some kind of blocker by default if orchestrators rely 
on unmigratable

field in virtio-fs vmstate to block the migration.
For this purpose I've implemented flag that selects "none" or 
"external" and is checked

in pre_save, so it could be extended with "internal" option.
We didn't come to conclusion if we also need to check incoming 
migration, the discussion

has stopped for a while but I'm going back to it now.

I would appreciate if you have time to take a look at the 
discussion and consider the idea
proposed there to store internal state as a subsection of vmstate 
to make it as an option

but not mandatory.

[1] 
https://patchew.org/QEMU/20230217170038.1273710-1-antonkuc...@yandex-team.ru/


So far I’ve mostly considered these issues orthogonal.  If your 
stateless migration goes in first, then state is optional and I’ll 
adjust this series.
If stateful migration goes in first, then your series can simply 
make state optional by introducing the external option, no?


Not really. State can be easily extended by subsections but not 
trimmed. Maybe this can be worked around by defining two types of 
vmstate and selecting the correct one at migration, but I'm not sure.


I thought your patch included a switch on the vhost-user-fs device (on 
the qemu side) to tell qemu what migration data to expect. Can we not 
transfer a 0-length state for 'external', and assert this on the 
destination side?


Looks like I really need to make the description of my patch and the 
documentation more clear :) My patch proposes switch on _source_ side to 
select which data to save in the stream mostly to protect old 
orchestrators that don't expect virtio-fs to be migratable (and for 
internal case it can be extended to select if qemu needs to request 
state from backend), Michael insists that we also need to check on 
destination but I disagree because I believe that we can figure this out 
from stream data without additional device flags.






But maybe we could also consider making stateless migration a 
special case of stateful migration; if we had stateful migration, 
can’t we just implement stateless migration by telling virtiofsd 
that it should submit a special “I have no state” pseudo-state, i.e. 
by having a switch on virtiofsd instead?


Sure. Backend can send empty state (as your patch treats 0 length as 
a valid response and not error) or dummy state that can be recognized 
as stateless. The only potential problem is that then we need support 
in backend for new commands even to return dummy state, and if 
backend can support both types then we'll need some switch in backend 
to reply with real or empty state.


Yes, exactly.



Off the top of my head, some downsides of that approach would be
(1) it’d need a switch on the virtiofsd side, not on the qemu side 
(not sure if that’s a downside, but a difference for sure),


Why would you? It seems to me that this affects only how qemu treats 
the vmstate of device. If the state was requested backend sends it to 
qemu. If state subsection is present in stream qemu sends it to the 
backend for loading. Stateless one just doesn't request state from 
the backend. Or am I missing something?


and (2) we’d need at least some support for this on the virtiofsd 
side, i.e. practically can’t come quicker than stateful migration 
support.


Not much, essentially this is just a reconnect. I've sent a draft of 
a reconnect patch for old C-virtiofsd, for rust version it takes much 
longer because I'm learning rust and I'm not really good at it yet.


I meant these two downsides not for your proposal, but instead if we 
implemented stateless migration only in the back-end; i.e. the 
front-end would only implement stateful migration, and the back-end 
would send and accept an empty state.


Then, qemu would always request a “state”

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-17 Thread Anton Kuchin

On 01/03/2023 17:33, Michael S. Tsirkin wrote:

On Tue, Feb 28, 2023 at 07:59:54PM +0200, Anton Kuchin wrote:

We can't rely here entirely on
destination to block this because if VM is migrated to file and then
can't be loaded by destination there is no way to fallback and resume
the source so we need to have some kind of blocker on source by default.

So I commented about a fallback. IMO it's just orchestrator being silly:
don't kill source qemu until you have made sure destination loaded the
file, or re-load it, and all will be well.


I agree that it is always better to keep source alive until destination
is loaded and ready to take-off. But in some cases resources are limited
or strictly partitioned so we can't keep two VMs alive at the same time
so the bast option is to check all we need on the source to make sure
destination will run.
Off the top of my head host-size VM would need entire additional host to
migrate properly and lots of memory streaming that can cause huge downtime,
but if memory is in shmem local migration to update qemu can take as much
as just saving device state to file and running new qemu to load devices,
take-over memory and continue running guest.



But a bigger issue that this highlights is simply that if migrating to
file you have no idea at all where will the file be loaded.  Maybe some
orchestrators know but I don't see how we can be sure all of them know.
The only time where we know whether the file is loaded on the same host
where it was saved is when we actually load it.



Yes. Migration to file requires orchestrator to be well aware of what
it is doing because file always contains only part of the state. This
is hard but sometimes there are no other good options.



Re: [Virtio-fs] [RFC 2/2] vhost-user-fs: Implement stateful migration

2023-03-17 Thread Anton Kuchin

On 17/03/2023 19:52, Hanna Czenczek wrote:

On 17.03.23 18:19, Anton Kuchin wrote:

On 13/03/2023 19:48, Hanna Czenczek wrote:

A virtio-fs device's VM state consists of:
- the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
- the back-end's (virtiofsd's) internal state

We get/set the latter via the new vhost-user operations 
FS_SET_STATE_FD,

FS_GET_STATE, and FS_SET_STATE.

Signed-off-by: Hanna Czenczek 
---
  hw/virtio/vhost-user-fs.c | 171 
+-

  1 file changed, 170 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 83fc20e49e..df1fb02acc 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -20,8 +20,10 @@
  #include "hw/virtio/virtio-bus.h"
  #include "hw/virtio/virtio-access.h"
  #include "qemu/error-report.h"
+#include "qemu/memfd.h"
  #include "hw/virtio/vhost.h"
  #include "hw/virtio/vhost-user-fs.h"
+#include "migration/qemu-file-types.h"
  #include "monitor/monitor.h"
  #include "sysemu/sysemu.h"
  @@ -298,9 +300,176 @@ static struct vhost_dev 
*vuf_get_vhost(VirtIODevice *vdev)

  return &fs->vhost_dev;
  }
  +/**
+ * Fetch the internal state from the back-end (virtiofsd) and save it
+ * to `f`.
+ */
+static int vuf_save_state(QEMUFile *f, void *pv, size_t size,
+  const VMStateField *field, JSONWriter 
*vmdesc)

+{
+    VirtIODevice *vdev = pv;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    int memfd = -1;
+    /* Size of the shared memory through which to transfer the 
state */

+    const size_t chunk_size = 4 * 1024 * 1024;
+    size_t state_offset;
+    ssize_t remaining;
+    void *shm_buf;
+    Error *local_err = NULL;
+    int ret, ret2;
+
+    /* Set up shared memory through which to receive the state from 
virtiofsd */

+    shm_buf = qemu_memfd_alloc("vhost-fs-state", chunk_size,
+   F_SEAL_SEAL | F_SEAL_SHRINK | 
F_SEAL_GROW,

+   &memfd, &local_err);
+    if (!shm_buf) {
+    error_report_err(local_err);
+    ret = -ENOMEM;
+    goto early_fail;
+    }
+
+    /* Share the SHM area with virtiofsd */
+    ret = vhost_fs_set_state_fd(&fs->vhost_dev, memfd, chunk_size);
+    if (ret < 0) {
+    goto early_fail;


Don't we need some log message here too?


Sure, why not.  There are other places in this patch that just return 
-errno but print no error, I think they could all use a verbose error 
message.



+    }
+
+    /* Receive the virtiofsd state in chunks, and write them to `f` */
+    state_offset = 0;
+    do {
+    size_t this_chunk_size;
+
+    remaining = vhost_fs_get_state(&fs->vhost_dev, state_offset,
+   chunk_size);
+    if (remaining < 0) {
+    ret = remaining;
+    goto fail;
+    }
+
+    /* Prefix the whole state by its total length */
+    if (state_offset == 0) {
+    qemu_put_be64(f, remaining);
+    }
+
+    this_chunk_size = MIN(remaining, chunk_size);
+    qemu_put_buffer(f, shm_buf, this_chunk_size);
+    state_offset += this_chunk_size;
+    } while (remaining >= chunk_size);
+
+    ret = 0;
+fail:
+    /* Have virtiofsd close the shared memory */
+    ret2 = vhost_fs_set_state_fd(&fs->vhost_dev, -1, 0);
+    if (ret2 < 0) {
+    error_report("Failed to remove state FD from the 
vhost-user-fs back "

+ "end: %s", strerror(-ret));
+    if (ret == 0) {
+    ret = ret2;
+    }
+    }
+
+early_fail:
+    if (shm_buf) {
+    qemu_memfd_free(shm_buf, chunk_size, memfd);
+    }
+
+    return ret;
+}
+
+/**
+ * Load the back-end's (virtiofsd's) internal state from `f` and send
+ * it over to that back-end.
+ */
+static int vuf_load_state(QEMUFile *f, void *pv, size_t size,
+  const VMStateField *field)
+{
+    VirtIODevice *vdev = pv;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    int memfd = -1;
+    /* Size of the shared memory through which to transfer the 
state */

+    const size_t chunk_size = 4 * 1024 * 1024;
+    size_t state_offset;
+    uint64_t remaining;
+    void *shm_buf;
+    Error *local_err = NULL;
+    int ret, ret2;
+
+    /* The state is prefixed by its total length, read that first */
+    remaining = qemu_get_be64(f);
+
+    /* Set up shared memory through which to send the state to 
virtiofsd */

+    shm_buf = qemu_memfd_alloc("vhost-fs-state", chunk_size,
+   F_SEAL_SEAL | F_SEAL_SHRINK | 
F_SEAL_GROW,

+   &memfd, &local_err);
+    if (!shm_buf) {
+    error_report_err(local_err);
+    ret = -ENOMEM;
+    goto early_fail;
+    }
+
+    /* Share the SHM area with virtiofsd *

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-17 Thread Anton Kuchin

On 06/03/2023 23:53, Michael S. Tsirkin wrote:

On Mon, Mar 06, 2023 at 10:55:29PM +0200, Anton Kuchin wrote:

On 01/03/2023 22:22, Michael S. Tsirkin wrote:

On Wed, Mar 01, 2023 at 09:35:56PM +0200, Anton Kuchin wrote:

I do trust them :)
I have to, otherwise we would need to pack all the properties and
flags of qemu to the migration stream and validate them at
destination or entire migration ends up broken beyond repair if
orchestrators tend to do stupid things and need babysitting.

This is not at all a crazy idea. It has some disadvantages
too esp around cross version migration, which is why we
don't do this yet.


Indeed. I know VMMs that chose this path. But as long as
qemu decided to trust orchestrators I think we'd better
keep this consistent across the codebase.

Only ivshmem_pre_load callback bothers to check device
property and virtio_net_tx_waiting_pre_load checks that
number of queue pairs doesn't exceed allowed maximum, all
other *_pre_load functions generally just initialize some
parts of state that need to be set before stream starts
loading.

We do validate things by necessity we just try to do
as much as we can table-driver so it's not open-coded
and less visible. We used to have lot more callbacks
nowdays we try to keep it table driven.
But e.g. pci streams RO state and validates that it's the same
on destination.



Sorry for late reply. I agree that checks should be done if
we have data at hand.
But in my opinion the situation here is a little different:
pci is emulated by qemu and RO state affects how emulation
will work on destination, the point of vhost-user is to
outsource device emulation logic to daemons outside qemu and
allow orchestrator manage both qemu and daemons.
So engineering additional flags in qemu that don't affect
device operation but only to check orchestrator correctness
is excessive in my opinion.




Re: [RFC 1/2] vhost-user: Add interface for virtio-fs migration

2023-03-17 Thread Anton Kuchin

On 13/03/2023 19:48, Hanna Czenczek wrote:

Add a virtio-fs-specific vhost-user interface to facilitate migrating
back-end-internal state.  We plan to migrate the internal state simply
as a binary blob after the streaming phase, so all we need is a way to
transfer such a blob from and to the back-end.  We do so by using a
dedicated area of shared memory through which the blob is transferred in
chunks.

This patch adds the following vhost operations (and implements them for
vhost-user):

- FS_SET_STATE_FD: The front-end passes a dedicated shared memory area
   to the back-end.  This area will be used to transfer state via the
   other two operations.
   (After the transfer FS_SET_STATE_FD detaches the shared memory area
   again.)

- FS_GET_STATE: The front-end asks the back-end to place a chunk of
   internal state into the shared memory area.

- FS_SET_STATE: The front-end puts a chunk of internal state into the
   shared memory area, and asks the back-end to fetch it.

On the source side, the back-end is expected to serialize its internal
state either when FS_SET_STATE_FD is invoked, or when FS_GET_STATE is
invoked the first time.  On subsequent FS_GET_STATE calls, it memcpy()s
parts of that serialized state into the shared memory area.

On the destination side, the back-end is expected to collect the state
blob over all FS_SET_STATE calls, and then deserialize and apply it once
FS_SET_STATE_FD detaches the shared memory area.

Signed-off-by: Hanna Czenczek 
---
  include/hw/virtio/vhost-backend.h |   9 ++
  include/hw/virtio/vhost.h |  68 +++
  hw/virtio/vhost-user.c| 138 ++
  hw/virtio/vhost.c |  29 +++
  4 files changed, 244 insertions(+)

diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index ec3fbae58d..fa3bd19386 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -42,6 +42,12 @@ typedef int (*vhost_backend_init)(struct vhost_dev *dev, 
void *opaque,
  typedef int (*vhost_backend_cleanup)(struct vhost_dev *dev);
  typedef int (*vhost_backend_memslots_limit)(struct vhost_dev *dev);
  
+typedef ssize_t (*vhost_fs_get_state_op)(struct vhost_dev *dev,

+ uint64_t state_offset, size_t size);
+typedef int (*vhost_fs_set_state_op)(struct vhost_dev *dev,
+ uint64_t state_offset, size_t size);
+typedef int (*vhost_fs_set_state_fd_op)(struct vhost_dev *dev, int memfd,
+size_t size);
  typedef int (*vhost_net_set_backend_op)(struct vhost_dev *dev,
  struct vhost_vring_file *file);
  typedef int (*vhost_net_set_mtu_op)(struct vhost_dev *dev, uint16_t mtu);
@@ -138,6 +144,9 @@ typedef struct VhostOps {
  vhost_backend_init vhost_backend_init;
  vhost_backend_cleanup vhost_backend_cleanup;
  vhost_backend_memslots_limit vhost_backend_memslots_limit;
+vhost_fs_get_state_op vhost_fs_get_state;
+vhost_fs_set_state_op vhost_fs_set_state;
+vhost_fs_set_state_fd_op vhost_fs_set_state_fd;
  vhost_net_set_backend_op vhost_net_set_backend;
  vhost_net_set_mtu_op vhost_net_set_mtu;
  vhost_scsi_set_endpoint_op vhost_scsi_set_endpoint;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a52f273347..b1ad9785dd 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -336,4 +336,72 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
 struct vhost_inflight *inflight);
  int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
 struct vhost_inflight *inflight);
+
+/**
+ * vhost_fs_set_state_fd(): Share memory with a virtio-fs vhost
+ * back-end for transferring internal state for the purpose of
+ * migration.  Calling this function again will have the back-end
+ * unregister (free) the previously shared memory area.
+ *
+ * @dev: The vhost device
+ * @memfd: File descriptor associated with the shared memory to share.
+ * If negative, no memory area is shared, only releasing the
+ * previously shared area, and announcing the end of transfer
+ * (which, on the destination side, should lead to the
+ * back-end deserializing and applying the received state).
+ * @size: Size of the shared memory area
+ *
+ * Returns 0 on success, and -errno on failure.
+ */
+int vhost_fs_set_state_fd(struct vhost_dev *dev, int memfd, size_t size);
+
+/**
+ * vhost_fs_get_state(): Request the virtio-fs vhost back-end to place
+ * a chunk of migration state into the shared memory area negotiated
+ * through vhost_fs_set_state_fd().  May only be used for migration,
+ * and only by the source side.
+ *
+ * The back-end-internal migration state is treated as a binary blob,
+ * which is transferred in chunks to fit into the shared memory area.
+ *
+ * @dev: The vhost device
+ * @state

Re: [RFC 2/2] vhost-user-fs: Implement stateful migration

2023-03-17 Thread Anton Kuchin

On 13/03/2023 19:48, Hanna Czenczek wrote:

A virtio-fs device's VM state consists of:
- the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
- the back-end's (virtiofsd's) internal state

We get/set the latter via the new vhost-user operations FS_SET_STATE_FD,
FS_GET_STATE, and FS_SET_STATE.

Signed-off-by: Hanna Czenczek 
---
  hw/virtio/vhost-user-fs.c | 171 +-
  1 file changed, 170 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 83fc20e49e..df1fb02acc 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -20,8 +20,10 @@
  #include "hw/virtio/virtio-bus.h"
  #include "hw/virtio/virtio-access.h"
  #include "qemu/error-report.h"
+#include "qemu/memfd.h"
  #include "hw/virtio/vhost.h"
  #include "hw/virtio/vhost-user-fs.h"
+#include "migration/qemu-file-types.h"
  #include "monitor/monitor.h"
  #include "sysemu/sysemu.h"
  
@@ -298,9 +300,176 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)

  return &fs->vhost_dev;
  }
  
+/**

+ * Fetch the internal state from the back-end (virtiofsd) and save it
+ * to `f`.
+ */
+static int vuf_save_state(QEMUFile *f, void *pv, size_t size,
+  const VMStateField *field, JSONWriter *vmdesc)
+{
+VirtIODevice *vdev = pv;
+VHostUserFS *fs = VHOST_USER_FS(vdev);
+int memfd = -1;
+/* Size of the shared memory through which to transfer the state */
+const size_t chunk_size = 4 * 1024 * 1024;
+size_t state_offset;
+ssize_t remaining;
+void *shm_buf;
+Error *local_err = NULL;
+int ret, ret2;
+
+/* Set up shared memory through which to receive the state from virtiofsd 
*/
+shm_buf = qemu_memfd_alloc("vhost-fs-state", chunk_size,
+   F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW,
+   &memfd, &local_err);
+if (!shm_buf) {
+error_report_err(local_err);
+ret = -ENOMEM;
+goto early_fail;
+}
+
+/* Share the SHM area with virtiofsd */
+ret = vhost_fs_set_state_fd(&fs->vhost_dev, memfd, chunk_size);
+if (ret < 0) {
+goto early_fail;


Don't we need some log message here too?


+}
+
+/* Receive the virtiofsd state in chunks, and write them to `f` */
+state_offset = 0;
+do {
+size_t this_chunk_size;
+
+remaining = vhost_fs_get_state(&fs->vhost_dev, state_offset,
+   chunk_size);
+if (remaining < 0) {
+ret = remaining;
+goto fail;
+}
+
+/* Prefix the whole state by its total length */
+if (state_offset == 0) {
+qemu_put_be64(f, remaining);
+}
+
+this_chunk_size = MIN(remaining, chunk_size);
+qemu_put_buffer(f, shm_buf, this_chunk_size);
+state_offset += this_chunk_size;
+} while (remaining >= chunk_size);
+
+ret = 0;
+fail:
+/* Have virtiofsd close the shared memory */
+ret2 = vhost_fs_set_state_fd(&fs->vhost_dev, -1, 0);
+if (ret2 < 0) {
+error_report("Failed to remove state FD from the vhost-user-fs back "
+ "end: %s", strerror(-ret));
+if (ret == 0) {
+ret = ret2;
+}
+}
+
+early_fail:
+if (shm_buf) {
+qemu_memfd_free(shm_buf, chunk_size, memfd);
+}
+
+return ret;
+}
+
+/**
+ * Load the back-end's (virtiofsd's) internal state from `f` and send
+ * it over to that back-end.
+ */
+static int vuf_load_state(QEMUFile *f, void *pv, size_t size,
+  const VMStateField *field)
+{
+VirtIODevice *vdev = pv;
+VHostUserFS *fs = VHOST_USER_FS(vdev);
+int memfd = -1;
+/* Size of the shared memory through which to transfer the state */
+const size_t chunk_size = 4 * 1024 * 1024;
+size_t state_offset;
+uint64_t remaining;
+void *shm_buf;
+Error *local_err = NULL;
+int ret, ret2;
+
+/* The state is prefixed by its total length, read that first */
+remaining = qemu_get_be64(f);
+
+/* Set up shared memory through which to send the state to virtiofsd */
+shm_buf = qemu_memfd_alloc("vhost-fs-state", chunk_size,
+   F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW,
+   &memfd, &local_err);
+if (!shm_buf) {
+error_report_err(local_err);
+ret = -ENOMEM;
+goto early_fail;
+}
+
+/* Share the SHM area with virtiofsd */
+ret = vhost_fs_set_state_fd(&fs->vhost_dev, memfd, chunk_size);
+if (ret < 0) {
+goto early_fail;
+}
+
+/*
+ * Read the virtiofsd state in chunks from `f`, and send them over
+ * to virtiofsd
+ */
+state_offset = 0;
+do {
+size_t this_chunk_size = MIN(remaining, chunk_size);
+
+if (qemu_get_buffer(f, shm_buf, this_chunk_size) < this_chunk_size) {
+ret = -EINVAL;
+goto fail;
+}
+
+  

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-06 Thread Anton Kuchin

On 01/03/2023 22:22, Michael S. Tsirkin wrote:

On Wed, Mar 01, 2023 at 09:35:56PM +0200, Anton Kuchin wrote:

I do trust them :)
I have to, otherwise we would need to pack all the properties and
flags of qemu to the migration stream and validate them at
destination or entire migration ends up broken beyond repair if
orchestrators tend to do stupid things and need babysitting.

This is not at all a crazy idea. It has some disadvantages
too esp around cross version migration, which is why we
don't do this yet.



Indeed. I know VMMs that chose this path. But as long as
qemu decided to trust orchestrators I think we'd better
keep this consistent across the codebase.

Only ivshmem_pre_load callback bothers to check device
property and virtio_net_tx_waiting_pre_load checks that
number of queue pairs doesn't exceed allowed maximum, all
other *_pre_load functions generally just initialize some
parts of state that need to be set before stream starts
loading.



Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-01 Thread Anton Kuchin

On 01/03/2023 17:52, Michael S. Tsirkin wrote:

On Wed, Mar 01, 2023 at 05:40:09PM +0200, Anton Kuchin wrote:

So catching errors in not the only purpose of this property, but it
definitely
allows us to catch some obvious ones.

OK let's say this. If migration=external then migration just works.
If migration=internal it fails for now. We are agreed here right?

Our argument is whether to check on load or save?

I propose this compromize: two properties:
migration-load and migration-save

migration-load : how is incoming migration handled.
 internal - through qemu
 external - through the daemon

  checked in pre-load

migration-save : how is outgoing migration handled.
 internal - through qemu
 external - through the daemon

  checked in post-save

This way whether the check is on source or destination or both
is up to the user.

Hmm?


Stefan, what do you think about this idea?




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-01 Thread Anton Kuchin

On 01/03/2023 19:17, Michael S. Tsirkin wrote:

On Wed, Mar 01, 2023 at 06:04:31PM +0200, Anton Kuchin wrote:

On 01/03/2023 17:24, Michael S. Tsirkin wrote:

On Wed, Mar 01, 2023 at 05:07:28PM +0200, Anton Kuchin wrote:

On 28/02/2023 23:24, Michael S. Tsirkin wrote:

On Tue, Feb 28, 2023 at 07:59:54PM +0200, Anton Kuchin wrote:

On 28/02/2023 16:57, Michael S. Tsirkin wrote:

On Tue, Feb 28, 2023 at 04:30:36PM +0200, Anton Kuchin wrote:

I really don't understand why and what do you want to check on
destination.

Yes I understand your patch controls source. Let me try to rephrase
why I think it's better on destination.
Here's my understanding
- With vhost-user-fs state lives inside an external daemon.
A- If after load you connect to the same daemon you can get migration mostly
  for free.
B- If you connect to a different daemon then that daemon will need
  to pass information from original one.

Is this a fair summary?

Current solution is to set flag on the source meaning "I have an
orchestration tool that will make sure that either A or B is correct".

However both A and B can only be known when destination is known.
Especially as long as what we are really trying to do is just allow qemu
restarts, Checking the flag on load will thus achive it in a cleaner
way, in that orchestration tool can reasonably keep the flag
clear normally and only set it if restarting qemu locally.


By comparison, with your approach orchestration tool will have
to either always set the flag (risky since then we lose the
extra check that we coded) or keep it clear and set before migration
(complex).

I hope I explained what and why I want to check.

I am far from a vhost-user-fs expert so maybe I am wrong but
I wanted to make sure I got the point across even if other
disagree.


Thank you for the explanation. Now I understand your concerns.

You are right about this mechanism being a bit risky if orchestrator is
not using it properly or clunky if it is used in a safest possible way.
That's why first attempt of this feature was with migration capability
to let orchestrator choose behavior right at the moment of migration.
But it has its own problems.

We can't move this check only to destination because one of main goals
was to prevent orchestrators that are unaware of vhost-user-fs specifics
from accidentally migrating such VMs. We can't rely here entirely on
destination to block this because if VM is migrated to file and then
can't be loaded by destination there is no way to fallback and resume
the source so we need to have some kind of blocker on source by default.

Interesting.  Why is there no way? Just load it back on source? Isn't
this how any other load failure is managed? Because for sure you
need to manage these, they will happen.

Because source can be already terminated

So start it again.

What is the difference between restarting the source and restarting
the destination to retry migration? If stream is correct it can be
loaded by destination if it is broken it won't be accepted at source too.

No.  First, destination has a different qemu version. Second file
can be corrupted in transfer. Third transfer can fail. Etc ...




and if load is not supported by
orchestrator and backend stream can't be loaded on source too.

How can an orchestrator not support load but support migration?

I was talking about orchestrators that rely on old device behavior
of blocking migration. They could attempt migration anyway and check if
it was blocked that is far from ideal but was OK and safe, and now this
becomes dangerous because state can be lost and VM becomes unloadable.


So we need to
ensure that only orchestrators that know what they are doing explicitly
enable
the feature are allowed to start migration.

that seems par for the course - if you want to use a feature you better
have an idea about how to do it.

If orchestrator is doing things like migrating to file
then scp that file, then it better be prepared to
restart VM on source because sometimes it will fail
on destination.

And an orchestrator that is not clever enough to do it, then it
just should not come up with funky ways to do migration.



Said that checking on destination would need another flag and the safe
way of using this feature would require managing two flags instead of one
making it even more fragile. So I'd prefer not to make it more complex.

In my opinion the best way to use this property by orchestrator is to
leave default unmigratable behavior at start and just before migration when
destination is known enumerate all vhost-user-fs devices and set properties
according to their backends capability with QMP like you mentioned. This
gives us single point of making the decision for each device and avoids
guessing future at VM start.

this means that you need to remember what the values were and then
any failure on destination requires you to go back and set them
to original valu

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-01 Thread Anton Kuchin

On 01/03/2023 17:52, Michael S. Tsirkin wrote:

On Wed, Mar 01, 2023 at 05:40:09PM +0200, Anton Kuchin wrote:

So catching errors in not the only purpose of this property, but it
definitely
allows us to catch some obvious ones.

OK let's say this. If migration=external then migration just works.
If migration=internal it fails for now. We are agreed here right?

Our argument is whether to check on load or save?

I propose this compromize: two properties:
migration-load and migration-save

migration-load : how is incoming migration handled.
 internal - through qemu
 external - through the daemon

  checked in pre-load

migration-save : how is outgoing migration handled.
 internal - through qemu
 external - through the daemon

  checked in post-save

This way whether the check is on source or destination or both
is up to the user.

Hmm?


Hmm, sounds interesting, this can be a really good compromise.

So migration-save will be "none" by default to protect unaware 
orchestrators.

What do you think should migration-load be "none" too to force orchestrator
set proper incoming migration type?




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-01 Thread Anton Kuchin

On 01/03/2023 17:24, Michael S. Tsirkin wrote:

On Wed, Mar 01, 2023 at 05:07:28PM +0200, Anton Kuchin wrote:

On 28/02/2023 23:24, Michael S. Tsirkin wrote:

On Tue, Feb 28, 2023 at 07:59:54PM +0200, Anton Kuchin wrote:

On 28/02/2023 16:57, Michael S. Tsirkin wrote:

On Tue, Feb 28, 2023 at 04:30:36PM +0200, Anton Kuchin wrote:

I really don't understand why and what do you want to check on
destination.

Yes I understand your patch controls source. Let me try to rephrase
why I think it's better on destination.
Here's my understanding
- With vhost-user-fs state lives inside an external daemon.
A- If after load you connect to the same daemon you can get migration mostly
 for free.
B- If you connect to a different daemon then that daemon will need
 to pass information from original one.

Is this a fair summary?

Current solution is to set flag on the source meaning "I have an
orchestration tool that will make sure that either A or B is correct".

However both A and B can only be known when destination is known.
Especially as long as what we are really trying to do is just allow qemu
restarts, Checking the flag on load will thus achive it in a cleaner
way, in that orchestration tool can reasonably keep the flag
clear normally and only set it if restarting qemu locally.


By comparison, with your approach orchestration tool will have
to either always set the flag (risky since then we lose the
extra check that we coded) or keep it clear and set before migration
(complex).

I hope I explained what and why I want to check.

I am far from a vhost-user-fs expert so maybe I am wrong but
I wanted to make sure I got the point across even if other
disagree.


Thank you for the explanation. Now I understand your concerns.

You are right about this mechanism being a bit risky if orchestrator is
not using it properly or clunky if it is used in a safest possible way.
That's why first attempt of this feature was with migration capability
to let orchestrator choose behavior right at the moment of migration.
But it has its own problems.

We can't move this check only to destination because one of main goals
was to prevent orchestrators that are unaware of vhost-user-fs specifics
from accidentally migrating such VMs. We can't rely here entirely on
destination to block this because if VM is migrated to file and then
can't be loaded by destination there is no way to fallback and resume
the source so we need to have some kind of blocker on source by default.

Interesting.  Why is there no way? Just load it back on source? Isn't
this how any other load failure is managed? Because for sure you
need to manage these, they will happen.

Because source can be already terminated

So start it again.


What is the difference between restarting the source and restarting
the destination to retry migration? If stream is correct it can be
loaded by destination if it is broken it won't be accepted at source too.


and if load is not supported by
orchestrator and backend stream can't be loaded on source too.

How can an orchestrator not support load but support migration?


I was talking about orchestrators that rely on old device behavior
of blocking migration. They could attempt migration anyway and check if
it was blocked that is far from ideal but was OK and safe, and now this
becomes dangerous because state can be lost and VM becomes unloadable.




So we need to
ensure that only orchestrators that know what they are doing explicitly
enable
the feature are allowed to start migration.

that seems par for the course - if you want to use a feature you better
have an idea about how to do it.

If orchestrator is doing things like migrating to file
then scp that file, then it better be prepared to
restart VM on source because sometimes it will fail
on destination.

And an orchestrator that is not clever enough to do it, then it
just should not come up with funky ways to do migration.



Said that checking on destination would need another flag and the safe
way of using this feature would require managing two flags instead of one
making it even more fragile. So I'd prefer not to make it more complex.

In my opinion the best way to use this property by orchestrator is to
leave default unmigratable behavior at start and just before migration when
destination is known enumerate all vhost-user-fs devices and set properties
according to their backends capability with QMP like you mentioned. This
gives us single point of making the decision for each device and avoids
guessing future at VM start.

this means that you need to remember what the values were and then
any failure on destination requires you to go back and set them
to original values. With possibility of crashes on the orchestrator
you also need to recall the temporary values in some file ...
This is huge complexity much worse than two flags.

Assuming we need two let's see whether just reload on source is good

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-01 Thread Anton Kuchin



On 01/03/2023 16:46, Michael S. Tsirkin wrote:

On Wed, Mar 01, 2023 at 05:03:03PM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 01.03.23 00:24, Michael S. Tsirkin wrote:

Said that checking on destination would need another flag and the safe
way of using this feature would require managing two flags instead of one
making it even more fragile. So I'd prefer not to make it more complex.

In my opinion the best way to use this property by orchestrator is to
leave default unmigratable behavior at start and just before migration when
destination is known enumerate all vhost-user-fs devices and set properties
according to their backends capability with QMP like you mentioned. This
gives us single point of making the decision for each device and avoids
guessing future at VM start.

this means that you need to remember what the values were and then
any failure on destination requires you to go back and set them
to original values.

Why do we need to restore old values?

To get back to where you were before you were starting migration.


For me, this new property is a kind of per-device migration
capability. Do we care to restore migration capabilities to the values
that they had before setting them for failed migration? We don't need
it, as we just always set capabilities as we want before each
migration. Same thing for this new property: just set it properly
before migration and you don't need to care about restoring it after
failed migration attempt.

If you really trust your management then we can just remove the
migration blocker and be done with it. All this song and dance
with changing properties is to catch errors. If one has to
carefully play with QOM to achieve the desired result then
IMHO we failed in this.


To be honest I would prefer just removing blocker because if orchestrator
doesn't know what it is doing it has lots of different ways to break
things and we can't do anything about it.
Just like vhost-user-scsi always allows migration since the day it was
introduced without additional checks and relies on the orchestrator.
But migration was blocked for vhost-user-fs when it was initially merged
and it is bad to change this contract now.

This property here is not only to block migration by default and catch
errors but really to select migration type. External migration can be
sometimes preferred even after internal is implemented because it requires
less calls to backend to extract internal state, less code to execute in
order to save and restore daemon state.
And this also will allow compatibility with old VMs that support only
external migration to move to internal migration without reboot someday
when it is implemented.

So catching errors in not the only purpose of this property, but it 
definitely

allows us to catch some obvious ones.




With possibility of crashes on the orchestrator
you also need to recall the temporary values in some file ...
This is huge complexity much worse than two flags.

Assuming we need two let's see whether just reload on source is good
enough.


--
Best regards,
Vladimir




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-03-01 Thread Anton Kuchin

On 28/02/2023 23:24, Michael S. Tsirkin wrote:

On Tue, Feb 28, 2023 at 07:59:54PM +0200, Anton Kuchin wrote:

On 28/02/2023 16:57, Michael S. Tsirkin wrote:

On Tue, Feb 28, 2023 at 04:30:36PM +0200, Anton Kuchin wrote:

I really don't understand why and what do you want to check on
destination.

Yes I understand your patch controls source. Let me try to rephrase
why I think it's better on destination.
Here's my understanding
- With vhost-user-fs state lives inside an external daemon.
A- If after load you connect to the same daemon you can get migration mostly
for free.
B- If you connect to a different daemon then that daemon will need
to pass information from original one.

Is this a fair summary?

Current solution is to set flag on the source meaning "I have an
orchestration tool that will make sure that either A or B is correct".

However both A and B can only be known when destination is known.
Especially as long as what we are really trying to do is just allow qemu
restarts, Checking the flag on load will thus achive it in a cleaner
way, in that orchestration tool can reasonably keep the flag
clear normally and only set it if restarting qemu locally.


By comparison, with your approach orchestration tool will have
to either always set the flag (risky since then we lose the
extra check that we coded) or keep it clear and set before migration
(complex).

I hope I explained what and why I want to check.

I am far from a vhost-user-fs expert so maybe I am wrong but
I wanted to make sure I got the point across even if other
disagree.


Thank you for the explanation. Now I understand your concerns.

You are right about this mechanism being a bit risky if orchestrator is
not using it properly or clunky if it is used in a safest possible way.
That's why first attempt of this feature was with migration capability
to let orchestrator choose behavior right at the moment of migration.
But it has its own problems.

We can't move this check only to destination because one of main goals
was to prevent orchestrators that are unaware of vhost-user-fs specifics
from accidentally migrating such VMs. We can't rely here entirely on
destination to block this because if VM is migrated to file and then
can't be loaded by destination there is no way to fallback and resume
the source so we need to have some kind of blocker on source by default.

Interesting.  Why is there no way? Just load it back on source? Isn't
this how any other load failure is managed? Because for sure you
need to manage these, they will happen.


Because source can be already terminated and if load is not supported by
orchestrator and backend stream can't be loaded on source too. So we need to
ensure that only orchestrators that know what they are doing explicitly 
enable

the feature are allowed to start migration.




Said that checking on destination would need another flag and the safe
way of using this feature would require managing two flags instead of one
making it even more fragile. So I'd prefer not to make it more complex.

In my opinion the best way to use this property by orchestrator is to
leave default unmigratable behavior at start and just before migration when
destination is known enumerate all vhost-user-fs devices and set properties
according to their backends capability with QMP like you mentioned. This
gives us single point of making the decision for each device and avoids
guessing future at VM start.

this means that you need to remember what the values were and then
any failure on destination requires you to go back and set them
to original values. With possibility of crashes on the orchestrator
you also need to recall the temporary values in some file ...
This is huge complexity much worse than two flags.

Assuming we need two let's see whether just reload on source is good
enough.


Reload on source can't be guaranteed to work too. And even if we could
guarantee it to work then we would also need to setup its incoming migration
type in case outgoing migration fails.
If orchestrator crashes and restarts it can revert flags for all devices
or can rely on next migration code to setup them correctly because they have
no effect between migrations anyway.

Reverting migration that failed on destination is not an easy task too.
It seems to be much more complicated than refusing to migrate on source.

I believe we should perform sanity checks if we have data but engineering
additional checks and putting extra restrictions just to prevent 
orchestrator

from doing wrong things is an overkill.


But allowing setup via command-line is valid too because some backends may
always be capable of external migration independent of hosts and don't need
the manipulations with QMP before migration at all.

I am much more worried that the realistic schenario is hard to manage
safely than about theoretical state migrating backends that don't exist.






Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-28 Thread Anton Kuchin

On 28/02/2023 16:57, Michael S. Tsirkin wrote:

On Tue, Feb 28, 2023 at 04:30:36PM +0200, Anton Kuchin wrote:

I really don't understand why and what do you want to check on
destination.

Yes I understand your patch controls source. Let me try to rephrase
why I think it's better on destination.
Here's my understanding
- With vhost-user-fs state lives inside an external daemon.
A- If after load you connect to the same daemon you can get migration mostly
   for free.
B- If you connect to a different daemon then that daemon will need
   to pass information from original one.

Is this a fair summary?

Current solution is to set flag on the source meaning "I have an
orchestration tool that will make sure that either A or B is correct".

However both A and B can only be known when destination is known.
Especially as long as what we are really trying to do is just allow qemu
restarts, Checking the flag on load will thus achive it in a cleaner
way, in that orchestration tool can reasonably keep the flag
clear normally and only set it if restarting qemu locally.


By comparison, with your approach orchestration tool will have
to either always set the flag (risky since then we lose the
extra check that we coded) or keep it clear and set before migration
(complex).

I hope I explained what and why I want to check.

I am far from a vhost-user-fs expert so maybe I am wrong but
I wanted to make sure I got the point across even if other
disagree.



Thank you for the explanation. Now I understand your concerns.

You are right about this mechanism being a bit risky if orchestrator is
not using it properly or clunky if it is used in a safest possible way.
That's why first attempt of this feature was with migration capability
to let orchestrator choose behavior right at the moment of migration.
But it has its own problems.

We can't move this check only to destination because one of main goals
was to prevent orchestrators that are unaware of vhost-user-fs specifics
from accidentally migrating such VMs. We can't rely here entirely on
destination to block this because if VM is migrated to file and then
can't be loaded by destination there is no way to fallback and resume
the source so we need to have some kind of blocker on source by default.

Said that checking on destination would need another flag and the safe
way of using this feature would require managing two flags instead of one
making it even more fragile. So I'd prefer not to make it more complex.

In my opinion the best way to use this property by orchestrator is to
leave default unmigratable behavior at start and just before migration when
destination is known enumerate all vhost-user-fs devices and set properties
according to their backends capability with QMP like you mentioned. This
gives us single point of making the decision for each device and avoids
guessing future at VM start.
But allowing setup via command-line is valid too because some backends may
always be capable of external migration independent of hosts and don't need
the manipulations with QMP before migration at all.




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-28 Thread Anton Kuchin

On 24/02/2023 10:47, Michael S. Tsirkin wrote:

On Thu, Feb 23, 2023 at 04:24:43PM -0500, Stefan Hajnoczi wrote:

On Thu, Feb 23, 2023 at 02:36:33AM -0500, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 03:21:42PM -0500, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 08:25:19PM +0200, Anton Kuchin wrote:

On 22/02/2023 19:12, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote:

On 22/02/2023 18:51, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote:

On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote:

On 22.02.23 17:25, Anton Kuchin wrote:

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device %s", path);
+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not
supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}

Should we also add this as .pre_load, to force user select
correct migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's migrated
from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.

But destination is a "source" for next migration, so there shouldn't be
real difference.
The new property has ".realized_set_allowed = true", so, as I understand
it may be changed at any time, so that's not a problem.

Yes, exactly. So destination's property sets not how it will handle this
incoming
migration but the future outgoing one.

How do you know where you are going to migrate though?
I think you don't.
Setting it on source is better since we know where we
are migrating from.

Yes, I don't know where I'm going to migrate to. This is why property
affects only how source saves state on outgoing migration.

Um. I don't get the logic.

For this feature to work we need orchestrator to manage the migration. And
we
generally assume that it is responsibility of orchestrator to ensure
matching
properties on source and destination.
As orchestrator manages both sides of migration it can set option (and we
can
check it) on either source or destination. Now it's not important which side
we
select, because now the option is essentially binary allow/deny (but IMHO it
is much better to refuse source to migrate than find later that state can't
be
loaded by destination, in case of file migration this becomes especially
painful).

But there are plans to add internal migration option (extract FUSE state
from
backend and transfer it in QEMU migration stream), and that's where
setting/checking
on source becomes important because it will rely on this property to decide
if
extra state form backend needs to be put in the migration stream subsection.


If we do internal migration that will be a different property
which has to match on source *and* destination.



If you are concerned about orchestrator breaking assumption of matching
properties
on source and destination this is not really supported AFAIK but I don't
think we
need to punish it for this, maybe it has its reasons: I can imagine scenario
where orchestrator could want to migrate from source with
'migration=external'
to destination with 'migration=none' to ensure that destination can't be
migrated further.

No. I am concerned about a simple practical matter:
- I decide to restart qemu on the same host - so I need to enable
   migration
- Later I decide to migrate qemu to another host - this should be
   blocked


Property on source does not satisfy both at the same time.
Property on destination does.


Stefan what's your take on this? Should we move this from
save to load hook?

This property can be changed on the source at runtime via qom-set, so
you don't need to predict the future. The device can be started from an
incoming migration with "external" and then set to "stateful" migration
to migrate to another host later on.

And then you are supposed to change it back if migration fails?
External tool failures have to be handled ...
How likely is all this fragile dance not to have bugs?

And all of this effort for what? Just to make the case of a buggy
management tool fail a bit faster - is it really worth it?

Compare to setting it on destination where you can set it
on command line or through qom and forget about it.
No?


I really don't understand why and what do you want to check on
destination. This option is supposed to control outgoing migratio

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-27 Thread Anton Kuchin



On 24/02/2023 06:14, Anton Kuchin wrote:

On 23/02/2023 23:24, Stefan Hajnoczi wrote:

On Thu, Feb 23, 2023 at 02:36:33AM -0500, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 03:21:42PM -0500, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 08:25:19PM +0200, Anton Kuchin wrote:

On 22/02/2023 19:12, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote:

On 22/02/2023 18:51, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote:

On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote:

On 22.02.23 17:25, Anton Kuchin wrote:

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = 
object_get_canonical_path(OBJECT(fs));

+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device 
%s", path);

+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not
supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}

Should we also add this as .pre_load, to force user select
correct migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's 
migrated

from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.
But destination is a "source" for next migration, so there 
shouldn't be

real difference.
The new property has ".realized_set_allowed = true", so, as I 
understand

it may be changed at any time, so that's not a problem.
Yes, exactly. So destination's property sets not how it will 
handle this

incoming
migration but the future outgoing one.

How do you know where you are going to migrate though?
I think you don't.
Setting it on source is better since we know where we
are migrating from.
Yes, I don't know where I'm going to migrate to. This is why 
property

affects only how source saves state on outgoing migration.

Um. I don't get the logic.
For this feature to work we need orchestrator to manage the 
migration. And

we
generally assume that it is responsibility of orchestrator to ensure
matching
properties on source and destination.
As orchestrator manages both sides of migration it can set option 
(and we

can
check it) on either source or destination. Now it's not important 
which side

we
select, because now the option is essentially binary allow/deny 
(but IMHO it
is much better to refuse source to migrate than find later that 
state can't

be
loaded by destination, in case of file migration this becomes 
especially

painful).

But there are plans to add internal migration option (extract FUSE 
state

from
backend and transfer it in QEMU migration stream), and that's where
setting/checking
on source becomes important because it will rely on this property 
to decide

if
extra state form backend needs to be put in the migration stream 
subsection.


If we do internal migration that will be a different property
which has to match on source *and* destination.


If you are concerned about orchestrator breaking assumption of 
matching

properties
on source and destination this is not really supported AFAIK but I 
don't

think we
need to punish it for this, maybe it has its reasons: I can 
imagine scenario

where orchestrator could want to migrate from source with
'migration=external'
to destination with 'migration=none' to ensure that destination 
can't be

migrated further.

No. I am concerned about a simple practical matter:
- I decide to restart qemu on the same host - so I need to enable
   migration
- Later I decide to migrate qemu to another host - this should be
   blocked


Property on source does not satisfy both at the same time.
Property on destination does.


Stefan what's your take on this? Should we move this from
save to load hook?

This property can be changed on the source at runtime via qom-set, so
you don't need to predict the future. The device can be started from an
incoming migration with "external" and then set to "stateful" migration
to migrate to another host later on.

Anton, can you share the virtiofsd patches so we have a full picture of
how "external" migration works? I'd like to understand the workflow and
also how it can be extended when "stateful" migration is added.


Unfortunately internal implementation is relying heavily on our 
infrastructure,
and rust virtiofsd still lacks dirty tracking so it is not ready yet. 
But I did

have a PoC for deprecated now C virtiofsd that I didn't bother to prepare
for upstreaming because C version was declared unsupported

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-23 Thread Anton Kuchin

On 23/02/2023 23:24, Stefan Hajnoczi wrote:

On Thu, Feb 23, 2023 at 02:36:33AM -0500, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 03:21:42PM -0500, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 08:25:19PM +0200, Anton Kuchin wrote:

On 22/02/2023 19:12, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote:

On 22/02/2023 18:51, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote:

On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote:

On 22.02.23 17:25, Anton Kuchin wrote:

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device %s", path);
+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not
supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}

Should we also add this as .pre_load, to force user select
correct migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's migrated
from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.

But destination is a "source" for next migration, so there shouldn't be
real difference.
The new property has ".realized_set_allowed = true", so, as I understand
it may be changed at any time, so that's not a problem.

Yes, exactly. So destination's property sets not how it will handle this
incoming
migration but the future outgoing one.

How do you know where you are going to migrate though?
I think you don't.
Setting it on source is better since we know where we
are migrating from.

Yes, I don't know where I'm going to migrate to. This is why property
affects only how source saves state on outgoing migration.

Um. I don't get the logic.

For this feature to work we need orchestrator to manage the migration. And
we
generally assume that it is responsibility of orchestrator to ensure
matching
properties on source and destination.
As orchestrator manages both sides of migration it can set option (and we
can
check it) on either source or destination. Now it's not important which side
we
select, because now the option is essentially binary allow/deny (but IMHO it
is much better to refuse source to migrate than find later that state can't
be
loaded by destination, in case of file migration this becomes especially
painful).

But there are plans to add internal migration option (extract FUSE state
from
backend and transfer it in QEMU migration stream), and that's where
setting/checking
on source becomes important because it will rely on this property to decide
if
extra state form backend needs to be put in the migration stream subsection.


If we do internal migration that will be a different property
which has to match on source *and* destination.



If you are concerned about orchestrator breaking assumption of matching
properties
on source and destination this is not really supported AFAIK but I don't
think we
need to punish it for this, maybe it has its reasons: I can imagine scenario
where orchestrator could want to migrate from source with
'migration=external'
to destination with 'migration=none' to ensure that destination can't be
migrated further.

No. I am concerned about a simple practical matter:
- I decide to restart qemu on the same host - so I need to enable
   migration
- Later I decide to migrate qemu to another host - this should be
   blocked


Property on source does not satisfy both at the same time.
Property on destination does.


Stefan what's your take on this? Should we move this from
save to load hook?

This property can be changed on the source at runtime via qom-set, so
you don't need to predict the future. The device can be started from an
incoming migration with "external" and then set to "stateful" migration
to migrate to another host later on.

Anton, can you share the virtiofsd patches so we have a full picture of
how "external" migration works? I'd like to understand the workflow and
also how it can be extended when "stateful" migration is added.


Unfortunately internal implementation is relying heavily on our 
infrastructure,
and rust virtiofsd still lacks dirty tracking so it is not ready yet. 
But I did

have a PoC for deprecated now C virtiofsd that I didn't bother to prepare
for upstreaming because C version was declared unsupported. It 
essentially adds

reconnect and this was the only thing required from

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-22 Thread Anton Kuchin

On 22/02/2023 22:21, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 08:25:19PM +0200, Anton Kuchin wrote:

On 22/02/2023 19:12, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote:

On 22/02/2023 18:51, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote:

On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote:

On 22.02.23 17:25, Anton Kuchin wrote:

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device %s", path);
+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not
supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}

Should we also add this as .pre_load, to force user select
correct migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's migrated
from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.

But destination is a "source" for next migration, so there shouldn't be
real difference.
The new property has ".realized_set_allowed = true", so, as I understand
it may be changed at any time, so that's not a problem.

Yes, exactly. So destination's property sets not how it will handle this
incoming
migration but the future outgoing one.

How do you know where you are going to migrate though?
I think you don't.
Setting it on source is better since we know where we
are migrating from.

Yes, I don't know where I'm going to migrate to. This is why property
affects only how source saves state on outgoing migration.

Um. I don't get the logic.

For this feature to work we need orchestrator to manage the migration. And
we
generally assume that it is responsibility of orchestrator to ensure
matching
properties on source and destination.
As orchestrator manages both sides of migration it can set option (and we
can
check it) on either source or destination. Now it's not important which side
we
select, because now the option is essentially binary allow/deny (but IMHO it
is much better to refuse source to migrate than find later that state can't
be
loaded by destination, in case of file migration this becomes especially
painful).

But there are plans to add internal migration option (extract FUSE state
from
backend and transfer it in QEMU migration stream), and that's where
setting/checking
on source becomes important because it will rely on this property to decide
if
extra state form backend needs to be put in the migration stream subsection.


If we do internal migration that will be a different property
which has to match on source *and* destination.


I'm not sure if we need other property. Initial idea was to allow
orchestrator setup which part of state qemu should put to stream
that will be sufficient to restore VM on destination.
But this depends on how external migration will be implemented.





If you are concerned about orchestrator breaking assumption of matching
properties
on source and destination this is not really supported AFAIK but I don't
think we
need to punish it for this, maybe it has its reasons: I can imagine scenario
where orchestrator could want to migrate from source with
'migration=external'
to destination with 'migration=none' to ensure that destination can't be
migrated further.

No. I am concerned about a simple practical matter:
- I decide to restart qemu on the same host - so I need to enable
   migration
- Later I decide to migrate qemu to another host - this should be
   blocked


Property on source does not satisfy both at the same time.
Property on destination does.


If destination QEMUs on local and remote hosts have same properties how 
can we

write check that passes on the same host and fails on remote?
Sorry, I don't understand how qemu can help to handle this. It knows nothing
about the hosts so this is responsibility of management to software to know
where it can migrate and configure it appropriately.

Maybe I didn't understand your scenario or what you propose to check on
destination. Could you explain a bit more?








This property selects if VM can migrate and if it can what should
qemu put
to the migration stream. So we select on source what type of
migration is
allowed for this VM, destination can't check anything at load time.

OK, so the new field "migration" regulates only outgoing migration and
do nothing for incoming. On incoming migration the migration 

Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-22 Thread Anton Kuchin

On 22/02/2023 19:12, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote:

On 22/02/2023 18:51, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote:

On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote:

On 22.02.23 17:25, Anton Kuchin wrote:

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device %s", path);
+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not
supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}

Should we also add this as .pre_load, to force user select
correct migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's migrated
from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.

But destination is a "source" for next migration, so there shouldn't be
real difference.
The new property has ".realized_set_allowed = true", so, as I understand
it may be changed at any time, so that's not a problem.

Yes, exactly. So destination's property sets not how it will handle this
incoming
migration but the future outgoing one.

How do you know where you are going to migrate though?
I think you don't.
Setting it on source is better since we know where we
are migrating from.

Yes, I don't know where I'm going to migrate to. This is why property
affects only how source saves state on outgoing migration.

Um. I don't get the logic.


For this feature to work we need orchestrator to manage the migration. 
And we
generally assume that it is responsibility of orchestrator to ensure 
matching

properties on source and destination.
As orchestrator manages both sides of migration it can set option (and 
we can
check it) on either source or destination. Now it's not important which 
side we

select, because now the option is essentially binary allow/deny (but IMHO it
is much better to refuse source to migrate than find later that state 
can't be

loaded by destination, in case of file migration this becomes especially
painful).

But there are plans to add internal migration option (extract FUSE state 
from
backend and transfer it in QEMU migration stream), and that's where 
setting/checking
on source becomes important because it will rely on this property to 
decide if

extra state form backend needs to be put in the migration stream subsection.

If you are concerned about orchestrator breaking assumption of matching 
properties
on source and destination this is not really supported AFAIK but I don't 
think we

need to punish it for this, maybe it has its reasons: I can imagine scenario
where orchestrator could want to migrate from source with 
'migration=external'

to destination with 'migration=none' to ensure that destination can't be
migrated further.





This property selects if VM can migrate and if it can what should
qemu put
to the migration stream. So we select on source what type of
migration is
allowed for this VM, destination can't check anything at load time.

OK, so the new field "migration" regulates only outgoing migration and
do nothing for incoming. On incoming migration the migration stream
itself defines the type of device migration.
Worth mentioning in doc?

Good point. I don't think this deserves a respin but if I have to send v4
I'll include
clarification in it.




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-22 Thread Anton Kuchin

On 22/02/2023 18:43, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 06:14:31PM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 22.02.23 17:25, Anton Kuchin wrote:

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device %s", path);
+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}

Should we also add this as .pre_load, to force user select correct 
migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's migrated
from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.

But destination is a "source" for next migration, so there shouldn't be real 
difference.

And whether to allow that migration should be decided by destination of
that migration.


Destination can just refuse to load unsupported state. But this happens
automatically if migration code finds unknown subsection and needs no
explicit check by device .pre_load.





The new property has ".realized_set_allowed = true", so, as I understand it may 
be changed at any time, so that's not a problem.


This property selects if VM can migrate and if it can what should qemu put
to the migration stream. So we select on source what type of migration is
allowed for this VM, destination can't check anything at load time.

OK, so the new field "migration" regulates only outgoing migration and do 
nothing for incoming. On incoming migration the migration stream itself defines the type 
of device migration.
Worth mentioning in doc?

--
Best regards,
Vladimir




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-22 Thread Anton Kuchin

On 22/02/2023 18:51, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote:

On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote:

On 22.02.23 17:25, Anton Kuchin wrote:

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device %s", path);
+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not
supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}

Should we also add this as .pre_load, to force user select
correct migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's migrated
from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.

But destination is a "source" for next migration, so there shouldn't be
real difference.
The new property has ".realized_set_allowed = true", so, as I understand
it may be changed at any time, so that's not a problem.

Yes, exactly. So destination's property sets not how it will handle this
incoming
migration but the future outgoing one.


How do you know where you are going to migrate though?
I think you don't.
Setting it on source is better since we know where we
are migrating from.


Yes, I don't know where I'm going to migrate to. This is why property
affects only how source saves state on outgoing migration.




This property selects if VM can migrate and if it can what should
qemu put
to the migration stream. So we select on source what type of
migration is
allowed for this VM, destination can't check anything at load time.

OK, so the new field "migration" regulates only outgoing migration and
do nothing for incoming. On incoming migration the migration stream
itself defines the type of device migration.
Worth mentioning in doc?

Good point. I don't think this deserves a respin but if I have to send v4
I'll include
clarification in it.




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-22 Thread Anton Kuchin

On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote:

On 22.02.23 17:25, Anton Kuchin wrote:

+static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device %s", path);
+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not supported by 
device %s",

+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}
Should we also add this as .pre_load, to force user select correct 
migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's migrated
from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.


But destination is a "source" for next migration, so there shouldn't 
be real difference.
The new property has ".realized_set_allowed = true", so, as I 
understand it may be changed at any time, so that's not a problem.


Yes, exactly. So destination's property sets not how it will handle this 
incoming

migration but the future outgoing one.





This property selects if VM can migrate and if it can what should 
qemu put
to the migration stream. So we select on source what type of 
migration is

allowed for this VM, destination can't check anything at load time.


OK, so the new field "migration" regulates only outgoing migration and 
do nothing for incoming. On incoming migration the migration stream 
itself defines the type of device migration.

Worth mentioning in doc?


Good point. I don't think this deserves a respin but if I have to send 
v4 I'll include

clarification in it.




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-22 Thread Anton Kuchin

On 22/02/2023 14:43, Michael S. Tsirkin wrote:

On Wed, Feb 22, 2023 at 03:20:00PM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 17.02.23 20:00, Anton Kuchin wrote:

Migration of vhost-user-fs device requires transfer of FUSE internal state
from backend. There is no standard way to do it now so by default migration
must be blocked. But if this state can be externally transferred by
orchestrator give it an option to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
   hw/core/qdev-properties-system.c| 10 +
   hw/virtio/vhost-user-fs.c   | 32 -
   include/hw/qdev-properties-system.h |  1 +
   include/hw/virtio/vhost-user-fs.h   |  2 ++
   qapi/migration.json | 16 +++
   5 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index d42493f630..d9b1aa2a5d 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = {
   .set   = set_uuid,
   .set_default_value = set_default_uuid_auto,
   };
+
+const PropertyInfo qdev_prop_vhost_user_migration_type = {
+.name = "VhostUserMigrationType",
+.description = "none/external",
+.enum_table = &VhostUserMigrationType_lookup,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+.realized_set_allowed = true,
+};
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 83fc20e49e..7deb9df5ec 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -298,9 +298,35 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
   return &fs->vhost_dev;
   }
+static int vhost_user_fs_pre_save(void *opaque)
+{
+VHostUserFS *fs = opaque;
+g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+switch (fs->migration_type) {
+case VHOST_USER_MIGRATION_TYPE_NONE:
+error_report("Migration is blocked by device %s", path);
+break;
+case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+return 0;
+default:
+error_report("Migration type '%s' is not supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+break;
+}
+
+return -1;
+}

Should we also add this as .pre_load, to force user select correct 
migration_type on target too?

In fact, I would claim we only want pre_load.
When qemu is started on destination we know where it's migrated
from so this flag can be set.
When qemu is started on source we generally do not yet know so
we don't know whether it's safe to set this flag.


This property selects if VM can migrate and if it can what should qemu put
to the migration stream. So we select on source what type of migration is
allowed for this VM, destination can't check anything at load time.





+
   static const VMStateDescription vuf_vmstate = {
   .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
   };
   static Property vuf_properties[] = {
@@ -309,6 +335,10 @@ static Property vuf_properties[] = {
   DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
  conf.num_request_queues, 1),
   DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ VhostUserMigrationType),

1. I see, other similar qdev_prop_* use DEFINE_PROP_SIGNED
2. All of them except only qdev_prop_fdc_drive_type, define also a convenient 
macro in include/hw/qdev-properties-system.h

should we follow these patterns?


--
Best regards,
Vladimir




Re: [PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-22 Thread Anton Kuchin

On 22/02/2023 14:20, Vladimir Sementsov-Ogievskiy wrote:

On 17.02.23 20:00, Anton Kuchin wrote:
Migration of vhost-user-fs device requires transfer of FUSE internal 
state
from backend. There is no standard way to do it now so by default 
migration

must be blocked. But if this state can be externally transferred by
orchestrator give it an option to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
  hw/core/qdev-properties-system.c    | 10 +
  hw/virtio/vhost-user-fs.c   | 32 -
  include/hw/qdev-properties-system.h |  1 +
  include/hw/virtio/vhost-user-fs.h   |  2 ++
  qapi/migration.json | 16 +++
  5 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev-properties-system.c 
b/hw/core/qdev-properties-system.c

index d42493f630..d9b1aa2a5d 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = {
  .set   = set_uuid,
  .set_default_value = set_default_uuid_auto,
  };
+
+const PropertyInfo qdev_prop_vhost_user_migration_type = {
+    .name = "VhostUserMigrationType",
+    .description = "none/external",
+    .enum_table = &VhostUserMigrationType_lookup,
+    .get = qdev_propinfo_get_enum,
+    .set = qdev_propinfo_set_enum,
+    .set_default_value = qdev_propinfo_set_default_value_enum,
+    .realized_set_allowed = true,
+};
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 83fc20e49e..7deb9df5ec 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -298,9 +298,35 @@ static struct vhost_dev 
*vuf_get_vhost(VirtIODevice *vdev)

  return &fs->vhost_dev;
  }
  +static int vhost_user_fs_pre_save(void *opaque)
+{
+    VHostUserFS *fs = opaque;
+    g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+    switch (fs->migration_type) {
+    case VHOST_USER_MIGRATION_TYPE_NONE:
+    error_report("Migration is blocked by device %s", path);
+    break;
+    case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+    return 0;
+    default:
+    error_report("Migration type '%s' is not supported by device 
%s",

+ VhostUserMigrationType_str(fs->migration_type), path);
+    break;
+    }
+
+    return -1;
+}


Should we also add this as .pre_load, to force user select correct 
migration_type on target too?


Why do we need it? Enum forces user to select at least one of the sane 
option
and I believe this is enough. As this property affects only save and 
don't know

what we can do at load.




+
  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-    .unmigratable = 1,
+    .minimum_version_id = 0,
+    .version_id = 0,
+    .fields = (VMStateField[]) {
+    VMSTATE_VIRTIO_DEVICE,
+    VMSTATE_END_OF_LIST()
+    },
+   .pre_save = vhost_user_fs_pre_save,
  };
    static Property vuf_properties[] = {
@@ -309,6 +335,10 @@ static Property vuf_properties[] = {
  DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
 conf.num_request_queues, 1),
  DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 
128),

+    DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ VhostUserMigrationType),


1. I see, other similar qdev_prop_* use DEFINE_PROP_SIGNED


I don't think this should be signed. Enum values are non-negative so 
compilers
(at least gcc and clang that I checked) evaluate underlying enum type to 
be unsigned int.
I don't know why other property types use signed, may be they have 
reasons or just this

is how they were initially implemented.

2. All of them except only qdev_prop_fdc_drive_type, define also a 
convenient macro in include/hw/qdev-properties-system.h


This makes sense if property is used in more than one place, in this 
case I don't see any
benefit from writing more code to handle this specific case. Maybe if 
property finds its

usage in other devices this can be done.



should we follow these patterns?






[PATCH v3 1/1] vhost-user-fs: add migration type property

2023-02-17 Thread Anton Kuchin
Migration of vhost-user-fs device requires transfer of FUSE internal state
from backend. There is no standard way to do it now so by default migration
must be blocked. But if this state can be externally transferred by
orchestrator give it an option to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
 hw/core/qdev-properties-system.c| 10 +
 hw/virtio/vhost-user-fs.c   | 32 -
 include/hw/qdev-properties-system.h |  1 +
 include/hw/virtio/vhost-user-fs.h   |  2 ++
 qapi/migration.json | 16 +++
 5 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index d42493f630..d9b1aa2a5d 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = {
 .set   = set_uuid,
 .set_default_value = set_default_uuid_auto,
 };
+
+const PropertyInfo qdev_prop_vhost_user_migration_type = {
+.name = "VhostUserMigrationType",
+.description = "none/external",
+.enum_table = &VhostUserMigrationType_lookup,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+.realized_set_allowed = true,
+};
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 83fc20e49e..7deb9df5ec 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -298,9 +298,35 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
 return &fs->vhost_dev;
 }
 
+static int vhost_user_fs_pre_save(void *opaque)
+{
+VHostUserFS *fs = opaque;
+g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+switch (fs->migration_type) {
+case VHOST_USER_MIGRATION_TYPE_NONE:
+error_report("Migration is blocked by device %s", path);
+break;
+case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+return 0;
+default:
+error_report("Migration type '%s' is not supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+break;
+}
+
+return -1;
+}
+
 static const VMStateDescription vuf_vmstate = {
 .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
 };
 
 static Property vuf_properties[] = {
@@ -309,6 +335,10 @@ static Property vuf_properties[] = {
 DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
conf.num_request_queues, 1),
 DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ VhostUserMigrationType),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/qdev-properties-system.h 
b/include/hw/qdev-properties-system.h
index 0ac327ae60..1a67591590 100644
--- a/include/hw/qdev-properties-system.h
+++ b/include/hw/qdev-properties-system.h
@@ -22,6 +22,7 @@ extern const PropertyInfo qdev_prop_audiodev;
 extern const PropertyInfo qdev_prop_off_auto_pcibar;
 extern const PropertyInfo qdev_prop_pcie_link_speed;
 extern const PropertyInfo qdev_prop_pcie_link_width;
+extern const PropertyInfo qdev_prop_vhost_user_migration_type;
 
 #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d)   \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t)
diff --git a/include/hw/virtio/vhost-user-fs.h 
b/include/hw/virtio/vhost-user-fs.h
index 94c3aaa84e..821b2a121c 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -19,6 +19,7 @@
 #include "hw/virtio/vhost-user.h"
 #include "chardev/char-fe.h"
 #include "qom/object.h"
+#include "qapi/qapi-types-migration.h"
 
 #define TYPE_VHOST_USER_FS "vhost-user-fs-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS)
@@ -40,6 +41,7 @@ struct VHostUserFS {
 VirtQueue **req_vqs;
 VirtQueue *hiprio_vq;
 int32_t bootindex;
+VhostUserMigrationType migration_type;
 
 /*< public >*/
 };
diff --git a/qapi/migration.json b/qapi/migration.json
index c84fa10e86..78feb20ffc 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -2178,3 +2178,19 @@
   'data': { 'job-id': 'str',
 'tag': 'str',
 'devices': ['str'] } }
+
+##
+# @VhostUserMigrationType:
+#
+# Type of vhost-user device migration.
+#
+# @none: Migration is not supported, attempts to migrate with this de

[PATCH v3 0/1] virtio-fs: implement option for stateless migration.

2023-02-17 Thread Anton Kuchin
v3:
 - Remove migration_type from migration stream
 - Use enum type for migration_type
 - Get rid of useless cast
 - Fix typos
 - Reword commit message

v2:
 - Use device property instead of migration capability

Anton Kuchin (1):
  vhost-user-fs: add migration type property

 hw/core/qdev-properties-system.c| 10 +
 hw/virtio/vhost-user-fs.c   | 32 -
 include/hw/qdev-properties-system.h |  1 +
 include/hw/virtio/vhost-user-fs.h   |  2 ++
 qapi/migration.json | 16 +++
 5 files changed, 60 insertions(+), 1 deletion(-)

-- 
2.37.2




Re: [PATCH v2 1/1] vhost-user-fs: add property to allow migration

2023-02-16 Thread Anton Kuchin

/* resend with fixed to: and cc: */

On 16/02/2023 18:22, Juan Quintela wrote:

"Michael S. Tsirkin"  wrote:

On Thu, Feb 16, 2023 at 11:11:22AM -0500, Michael S. Tsirkin wrote:

On Thu, Feb 16, 2023 at 03:14:05PM +0100, Juan Quintela wrote:

Anton Kuchin  wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But it is good to have an option for orchestrator to tune this according to
backend capabilities and migration configuration.

This patch adds device property 'migration' that is 'none' by default
to keep old behaviour but can be set to 'external' to explicitly allow
migration with minimal virtio device state in migration stream if daemon
has some way to sync FUSE state on src and dst without help from qemu.

Signed-off-by: Anton Kuchin 

Reviewed-by: Juan Quintela 

The migration bits are correct.

And I can think a better way to explain that one device is migrated
externally.


I'm bad at wording but I'll try to improve this one.
Suggestions will be really appreciated.



If you have to respin:


+static int vhost_user_fs_pre_save(void *opaque)
+{
+VHostUserFS *fs = (VHostUserFS *)opaque;

This hack is useless.


I will. Will get rid of that, thanks.


meaning the cast? yes.


I know that there are still lots of code that still have it.


Now remember that I have no clue about vhost-user-fs.

But this looks fishy

  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_UINT8(migration_type, VHostUserFS),
+VMSTATE_END_OF_LIST()

In fact why do we want to migrate this property?
We generally don't, we only migrate state.

See previous discussion.
In a nutshell, we are going to have internal migration in the future
(not done yet).

Later, Juan.


I think Michael is right. We don't need it at destination to know
what data is in the stream because subsections will tell us all we
need to know.




+},
+   .pre_save = vhost_user_fs_pre_save,
  };
  
  static Property vuf_properties[] = {

@@ -309,6 +337,10 @@ static Property vuf_properties[] = {
  DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
 conf.num_request_queues, 1),
  DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ uint8_t),
  DEFINE_PROP_END_OF_LIST(),

We have four properties here (5 with the new migration one), and you
only migrate one.

This looks fishy, but I don't know if it makes sense.
If they _have_ to be configured the same on source and destination, I
would transfer them and check in post_load that the values are correct.

Later, Juan.

Weird suggestion.  We generally don't do this kind of check - that
would be open-coding each property. It's management's job to make
sure things are consistent.

--
MST






Re: [PATCH v2 1/1] vhost-user-fs: add property to allow migration

2023-02-16 Thread Anton Kuchin

On 16/02/2023 18:22, Juan Quintela wrote:

"Michael S. Tsirkin"  wrote:

On Thu, Feb 16, 2023 at 11:11:22AM -0500, Michael S. Tsirkin wrote:

On Thu, Feb 16, 2023 at 03:14:05PM +0100, Juan Quintela wrote:

Anton Kuchin  wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But it is good to have an option for orchestrator to tune this according to
backend capabilities and migration configuration.

This patch adds device property 'migration' that is 'none' by default
to keep old behaviour but can be set to 'external' to explicitly allow
migration with minimal virtio device state in migration stream if daemon
has some way to sync FUSE state on src and dst without help from qemu.

Signed-off-by: Anton Kuchin 

Reviewed-by: Juan Quintela 

The migration bits are correct.

And I can think a better way to explain that one device is migrated
externally.


I'm bad at wording but I'll try to improve this one.
Suggestions will be really appreciated.



If you have to respin:


+static int vhost_user_fs_pre_save(void *opaque)
+{
+VHostUserFS *fs = (VHostUserFS *)opaque;

This hack is useless.


I will. Will get rid of that, thanks.


meaning the cast? yes.


I know that there are still lots of code that still have it.


Now remember that I have no clue about vhost-user-fs.

But this looks fishy

  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_UINT8(migration_type, VHostUserFS),
+VMSTATE_END_OF_LIST()

In fact why do we want to migrate this property?
We generally don't, we only migrate state.

See previous discussion.
In a nutshell, we are going to have internal migration in the future
(not done yet).

Later, Juan.


I think Michael is right. We don't need it at destination to know
what data is in the stream because subsections will tell us all we
need to know.




+},
+   .pre_save = vhost_user_fs_pre_save,
  };
  
  static Property vuf_properties[] = {

@@ -309,6 +337,10 @@ static Property vuf_properties[] = {
  DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
 conf.num_request_queues, 1),
  DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ uint8_t),
  DEFINE_PROP_END_OF_LIST(),

We have four properties here (5 with the new migration one), and you
only migrate one.

This looks fishy, but I don't know if it makes sense.
If they _have_ to be configured the same on source and destination, I
would transfer them and check in post_load that the values are correct.

Later, Juan.

Weird suggestion.  We generally don't do this kind of check - that
would be open-coding each property. It's management's job to make
sure things are consistent.

--
MST






Re: [PATCH v2 1/1] vhost-user-fs: add property to allow migration

2023-02-16 Thread Anton Kuchin

On 16/02/2023 23:09, Stefan Hajnoczi wrote:

On Thu, Feb 16, 2023 at 04:00:03PM +0200, Anton Kuchin wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But it is good to have an option for orchestrator to tune this according to
backend capabilities and migration configuration.

This patch adds device property 'migration' that is 'none' by default
to keep old behaviour but can be set to 'external' to explicitly allow
migration with minimal virtio device state in migration stream if daemon
has some way to sync FUSE state on src and dst without help from qemu.

Signed-off-by: Anton Kuchin 
---
  hw/core/qdev-properties-system.c| 10 +
  hw/virtio/vhost-user-fs.c   | 34 -
  include/hw/qdev-properties-system.h |  1 +
  include/hw/virtio/vhost-user-fs.h   |  1 +
  qapi/migration.json | 16 ++
  5 files changed, 61 insertions(+), 1 deletion(-)

Looks okay to me. Comments below.


diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index d42493f630..d9b1aa2a5d 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = {
  .set   = set_uuid,
  .set_default_value = set_default_uuid_auto,
  };
+
+const PropertyInfo qdev_prop_vhost_user_migration_type = {
+.name = "VhostUserMigrationType",
+.description = "none/external",
+.enum_table = &VhostUserMigrationType_lookup,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+.realized_set_allowed = true,
+};
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 83fc20e49e..e2a5b6cfdf 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
  #include "hw/virtio/vhost-user-fs.h"
  #include "monitor/monitor.h"
  #include "sysemu/sysemu.h"
+#include "qapi/qapi-types-migration.h"
  
  static const int user_feature_bits[] = {

  VIRTIO_F_VERSION_1,
@@ -298,9 +299,36 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
  return &fs->vhost_dev;
  }
  
+static int vhost_user_fs_pre_save(void *opaque)

+{
+VHostUserFS *fs = (VHostUserFS *)opaque;
+g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+switch (fs->migration_type) {
+case VHOST_USER_MIGRATION_TYPE_NONE:
+error_report("Migration is blocked by device %s", path);
+break;
+case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+return 0;
+default:
+error_report("Migration type '%s' is not supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+break;
+}
+
+return -1;
+}
+
  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_UINT8(migration_type, VHostUserFS),

Maybe add a comment since Michael asked what the purpose of this field
is:

   /* For verifying that source/destination migration= properties match */
   VMSTATE_UINT8(migration_type, VHostUserFS),

Come to think of it...where is the code that checks the vmstate
migration_type field matches the destination device's migration=
property?


It's a good idea to have a comment here, thanks.
This field is not really for verifying that source and destination match.
It just describes what data is packed into the stream.
In fact this property could be different and this breaks nothing:
e.g. when we migrate from version that can export only stateless stream
to one that finally supports internal migration.

Said that I start thinking that we don't need it in the stream at all
because extra data can be detected just by presence of additional
subsection.




+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
  };
  
  static Property vuf_properties[] = {

@@ -309,6 +337,10 @@ static Property vuf_properties[] = {
  DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
 conf.num_request_queues, 1),
  DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ uint8_t),
  DEFINE_PROP_END_OF_LIST(),
  };
  
diff --git a/include/hw/qdev-properties-system.h b/include/hw/qdev-properties-system.h

index 

[PATCH v2 0/1] virtio-fs: implement option for stateless migration.

2023-02-16 Thread Anton Kuchin
v2: Use device property instead of migration capability

Anton Kuchin (1):
  vhost-user-fs: add property to allow migration

 hw/core/qdev-properties-system.c| 10 +
 hw/virtio/vhost-user-fs.c   | 34 -
 include/hw/qdev-properties-system.h |  1 +
 include/hw/virtio/vhost-user-fs.h   |  1 +
 qapi/migration.json | 16 ++
 5 files changed, 61 insertions(+), 1 deletion(-)

-- 
2.37.2




[PATCH v2 1/1] vhost-user-fs: add property to allow migration

2023-02-16 Thread Anton Kuchin
Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But it is good to have an option for orchestrator to tune this according to
backend capabilities and migration configuration.

This patch adds device property 'migration' that is 'none' by default
to keep old behaviour but can be set to 'external' to explicitly allow
migration with minimal virtio device state in migration stream if daemon
has some way to sync FUSE state on src and dst without help from qemu.

Signed-off-by: Anton Kuchin 
---
 hw/core/qdev-properties-system.c| 10 +
 hw/virtio/vhost-user-fs.c   | 34 -
 include/hw/qdev-properties-system.h |  1 +
 include/hw/virtio/vhost-user-fs.h   |  1 +
 qapi/migration.json | 16 ++
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index d42493f630..d9b1aa2a5d 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = {
 .set   = set_uuid,
 .set_default_value = set_default_uuid_auto,
 };
+
+const PropertyInfo qdev_prop_vhost_user_migration_type = {
+.name = "VhostUserMigrationType",
+.description = "none/external",
+.enum_table = &VhostUserMigrationType_lookup,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+.realized_set_allowed = true,
+};
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 83fc20e49e..e2a5b6cfdf 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
 #include "hw/virtio/vhost-user-fs.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
+#include "qapi/qapi-types-migration.h"
 
 static const int user_feature_bits[] = {
 VIRTIO_F_VERSION_1,
@@ -298,9 +299,36 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
 return &fs->vhost_dev;
 }
 
+static int vhost_user_fs_pre_save(void *opaque)
+{
+VHostUserFS *fs = (VHostUserFS *)opaque;
+g_autofree char *path = object_get_canonical_path(OBJECT(fs));
+
+switch (fs->migration_type) {
+case VHOST_USER_MIGRATION_TYPE_NONE:
+error_report("Migration is blocked by device %s", path);
+break;
+case VHOST_USER_MIGRATION_TYPE_EXTERNAL:
+return 0;
+default:
+error_report("Migration type '%s' is not supported by device %s",
+ VhostUserMigrationType_str(fs->migration_type), path);
+break;
+}
+
+return -1;
+}
+
 static const VMStateDescription vuf_vmstate = {
 .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_UINT8(migration_type, VHostUserFS),
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
 };
 
 static Property vuf_properties[] = {
@@ -309,6 +337,10 @@ static Property vuf_properties[] = {
 DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
conf.num_request_queues, 1),
 DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ uint8_t),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/qdev-properties-system.h 
b/include/hw/qdev-properties-system.h
index 0ac327ae60..1a67591590 100644
--- a/include/hw/qdev-properties-system.h
+++ b/include/hw/qdev-properties-system.h
@@ -22,6 +22,7 @@ extern const PropertyInfo qdev_prop_audiodev;
 extern const PropertyInfo qdev_prop_off_auto_pcibar;
 extern const PropertyInfo qdev_prop_pcie_link_speed;
 extern const PropertyInfo qdev_prop_pcie_link_width;
+extern const PropertyInfo qdev_prop_vhost_user_migration_type;
 
 #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d)   \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t)
diff --git a/include/hw/virtio/vhost-user-fs.h 
b/include/hw/virtio/vhost-user-fs.h
index 94c3aaa84e..3ebce77be5 100644
--- a/include/hw/virtio/vhost-user-fs.h
+++ b/include/hw/virtio/vhost-user-fs.h
@@ -40,6 +40,7 @@ struct VHostUserFS {
 VirtQueue **req_vqs;
 VirtQueue *hiprio_vq;
 int32_t bootindex;
+uint8_t migration_type;
 
 /*< public >*/
 };
diff --git a/qapi/migration.json b/qapi/migration.json
index c84fa10e86..ababd605a2 100644
--- a/qapi/migration.json
+++ b/qapi/migrat

Re: [PATCH v4 13/16] pci: introduce pci_find_the_only_child()

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:01, Vladimir Sementsov-Ogievskiy wrote:

To be used in further patch to identify the device hot-plugged into
pcie-root-port.

Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Anton Kuchin 

---
  include/hw/pci/pci.h |  1 +
  hw/pci/pci.c | 33 +
  2 files changed, 34 insertions(+)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d5a40cd058..b6c9c44527 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -341,6 +341,7 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus,
  void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
pci_bus_fn end, void *parent_state);
  PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
+PCIDevice *pci_find_the_only_child(PCIBus *bus, int bus_num, Error **errp);
  
  /* Use this wrapper when specific scan order is not required. */

  static inline
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 208c16f450..34fd1fb5b8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1771,6 +1771,39 @@ void pci_for_each_device(PCIBus *bus, int bus_num,
  }
  }
  
+typedef struct TheOnlyChild {

+PCIDevice *dev;
+int count;
+} TheOnlyChild;
+
+static void the_only_child_fn(PCIBus *bus, PCIDevice *dev, void *opaque)
+{
+TheOnlyChild *s = opaque;
+
+s->dev = dev;
+s->count++;
+}
+
+PCIDevice *pci_find_the_only_child(PCIBus *bus, int bus_num, Error **errp)
+{
+TheOnlyChild res = {0};
+
+pci_for_each_device(bus, bus_num, the_only_child_fn, &res);
+
+if (!res.dev) {
+assert(res.count == 0);
+error_setg(errp, "No child devices found");
+return NULL;
+}
+
+if (res.count > 1) {
+error_setg(errp, "Several child devices found");
+return NULL;
+}
+
+return res.dev;
+}
+
  const pci_class_desc *get_class_desc(int class)
  {
  const pci_class_desc *desc;




Re: [PATCH v4 12/16] pcie: set power indicator to off on reset by default

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

It should be zero, the only valid values are ON, OFF and BLINK.

Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Anton Kuchin 

---
  hw/pci/pcie.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 90faf0710a..b8c24cf45f 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -684,6 +684,7 @@ void pcie_cap_slot_reset(PCIDevice *dev)
   PCI_EXP_SLTCTL_PDCE |
   PCI_EXP_SLTCTL_ABPE);
  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
+   PCI_EXP_SLTCTL_PWR_IND_OFF |
 PCI_EXP_SLTCTL_ATTN_IND_OFF);
  
  if (dev->cap_present & QEMU_PCIE_SLTCAP_PCP) {




Re: [PATCH v4 11/16] pcie: introduce pcie_sltctl_powered_off() helper

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

In pcie_cap_slot_write_config() we check for PCI_EXP_SLTCTL_PWR_OFF in
a bad form. We should distinguish PCI_EXP_SLTCTL_PWR which is a "mask"
and PCI_EXP_SLTCTL_PWR_OFF which is value for that mask.

Better code is in pcie_cap_slot_unplug_request_cb() and in
pcie_cap_update_power(). Let's use same pattern everywhere. To simplify
things add also a helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Anton Kuchin 

---
  hw/pci/pcie.c | 16 ++--
  1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index db8360226f..90faf0710a 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -39,6 +39,11 @@
  #define PCIE_DEV_PRINTF(dev, fmt, ...)  \
  PCIE_DPRINTF("%s:%x "fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__)
  
+static bool pcie_sltctl_powered_off(uint16_t sltctl)

+{
+return (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_OFF
+&& (sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF;
+}
  
  /***

   * pci express capability helper functions
@@ -395,6 +400,7 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev)
  
  if (sltcap & PCI_EXP_SLTCAP_PCP) {

  power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON;
+/* Don't we need to check also (sltctl & PCI_EXP_SLTCTL_PIC) ? */
  }
  
  pci_for_each_device(sec_bus, pci_bus_num(sec_bus),

@@ -579,8 +585,7 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler 
*hotplug_dev,
  return;
  }
  
-if (((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF) &&

-((sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_OFF)) {
+if (pcie_sltctl_powered_off(sltctl)) {
  /* slot is powered off -> unplug without round-trip to the guest */
  pcie_cap_slot_do_unplug(hotplug_pdev);
  hotplug_event_notify(hotplug_pdev);
@@ -770,10 +775,9 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
   * this is a work around for guests that overwrite
   * control of powered off slots before powering them on.
   */
-if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
-(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF &&
-(!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
-(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PWR_IND_OFF)) {
+if ((sltsta & PCI_EXP_SLTSTA_PDS) && pcie_sltctl_powered_off(val) &&
+!pcie_sltctl_powered_off(old_slt_ctl))
+{
  pcie_cap_slot_do_unplug(dev);
  }
  pcie_cap_update_power(dev);




Re: [PATCH v4 10/16] pcie: pcie_cap_slot_enable_power() use correct helper

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

*_by_mask() helpers shouldn't be used here (and that's the only one).
*_by_mask() helpers do shift their value argument, but in pcie.c code
we use values that are already shifted appropriately.
Happily, PCI_EXP_SLTCTL_PWR_ON is zero, so shift doesn't matter. But if
we apply same helper for PCI_EXP_SLTCTL_PWR_OFF constant it will do
wrong thing.

So, let's use instead pci_word_test_and_clear_mask() which is already
used in the file to clear PCI_EXP_SLTCTL_PWR_OFF bit in
pcie_cap_slot_init() and pcie_cap_slot_reset().

Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Anton Kuchin 

---
  hw/pci/pcie.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index ccdb2377e1..db8360226f 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -373,8 +373,8 @@ void pcie_cap_slot_enable_power(PCIDevice *dev)
  uint32_t sltcap = pci_get_long(exp_cap + PCI_EXP_SLTCAP);
  
  if (sltcap & PCI_EXP_SLTCAP_PCP) {

-pci_set_word_by_mask(exp_cap + PCI_EXP_SLTCTL,
- PCI_EXP_SLTCTL_PCC, PCI_EXP_SLTCTL_PWR_ON);
+pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTCTL,
+ PCI_EXP_SLTCTL_PCC);
  }
  }
  




Re: [PATCH v4 09/16] pcie: drop unused PCIExpressIndicator

2023-02-14 Thread Anton Kuchin



On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

The structure type is unused. Also, it's the only user of corresponding
macros, so drop them too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Anton Kuchin 

---
  include/hw/pci/pcie.h  | 8 
  include/hw/pci/pcie_regs.h | 5 -
  2 files changed, 13 deletions(-)

diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 798a262a0a..3cc2b15957 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -27,14 +27,6 @@
  #include "hw/pci/pcie_sriov.h"
  #include "hw/hotplug.h"
  
-typedef enum {

-/* for attention and power indicator */
-PCI_EXP_HP_IND_RESERVED = PCI_EXP_SLTCTL_IND_RESERVED,
-PCI_EXP_HP_IND_ON   = PCI_EXP_SLTCTL_IND_ON,
-PCI_EXP_HP_IND_BLINK= PCI_EXP_SLTCTL_IND_BLINK,
-PCI_EXP_HP_IND_OFF  = PCI_EXP_SLTCTL_IND_OFF,
-} PCIExpressIndicator;
-
  typedef enum {
  /* these bits must match the bits in Slot Control/Status registers.
   * PCI_EXP_HP_EV_xxx = PCI_EXP_SLTCTL_xxxE = PCI_EXP_SLTSTA_xxx
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 00b595a82e..1fe0bdd25b 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -66,11 +66,6 @@ typedef enum PCIExpLinkWidth {
  
  #define PCI_EXP_SLTCAP_PSN_SHIFTctz32(PCI_EXP_SLTCAP_PSN)
  
-#define PCI_EXP_SLTCTL_IND_RESERVED 0x0

-#define PCI_EXP_SLTCTL_IND_ON   0x1
-#define PCI_EXP_SLTCTL_IND_BLINK0x2
-#define PCI_EXP_SLTCTL_IND_OFF  0x3
-
  #define PCI_EXP_SLTCTL_SUPPORTED\
  (PCI_EXP_SLTCTL_ABPE |  \
   PCI_EXP_SLTCTL_PDCE |  \




Re: [PATCH v4 08/16] pcie_regs: drop duplicated indicator value macros

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

We already have indicator values in
include/standard-headers/linux/pci_regs.h , no reason to reinvent them
in include/hw/pci/pcie_regs.h. (and we already have usage of
PCI_EXP_SLTCTL_PWR_IND_BLINK and PCI_EXP_SLTCTL_PWR_IND_OFF in
hw/pci/pcie.c, so let's be consistent)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Anton Kuchin 

---
  include/hw/pci/pcie_regs.h |  9 -
  hw/pci/pcie.c  | 13 +++--
  2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 963dc2e170..00b595a82e 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -70,15 +70,6 @@ typedef enum PCIExpLinkWidth {
  #define PCI_EXP_SLTCTL_IND_ON   0x1
  #define PCI_EXP_SLTCTL_IND_BLINK0x2
  #define PCI_EXP_SLTCTL_IND_OFF  0x3
-#define PCI_EXP_SLTCTL_AIC_SHIFTctz32(PCI_EXP_SLTCTL_AIC)
-#define PCI_EXP_SLTCTL_AIC_OFF  \
-(PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_AIC_SHIFT)
-
-#define PCI_EXP_SLTCTL_PIC_SHIFTctz32(PCI_EXP_SLTCTL_PIC)
-#define PCI_EXP_SLTCTL_PIC_OFF  \
-(PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_PIC_SHIFT)
-#define PCI_EXP_SLTCTL_PIC_ON  \
-(PCI_EXP_SLTCTL_IND_ON << PCI_EXP_SLTCTL_PIC_SHIFT)
  
  #define PCI_EXP_SLTCTL_SUPPORTED\

  (PCI_EXP_SLTCTL_ABPE |  \
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 82ef723983..ccdb2377e1 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -634,8 +634,8 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
   PCI_EXP_SLTCTL_PIC |
   PCI_EXP_SLTCTL_AIC);
  pci_word_test_and_set_mask(dev->config + pos + PCI_EXP_SLTCTL,
-   PCI_EXP_SLTCTL_PIC_OFF |
-   PCI_EXP_SLTCTL_AIC_OFF);
+   PCI_EXP_SLTCTL_PWR_IND_OFF |
+   PCI_EXP_SLTCTL_ATTN_IND_OFF);
  pci_word_test_and_set_mask(dev->wmask + pos + PCI_EXP_SLTCTL,
 PCI_EXP_SLTCTL_PIC |
 PCI_EXP_SLTCTL_AIC |
@@ -679,7 +679,7 @@ void pcie_cap_slot_reset(PCIDevice *dev)
   PCI_EXP_SLTCTL_PDCE |
   PCI_EXP_SLTCTL_ABPE);
  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
-   PCI_EXP_SLTCTL_AIC_OFF);
+   PCI_EXP_SLTCTL_ATTN_IND_OFF);
  
  if (dev->cap_present & QEMU_PCIE_SLTCAP_PCP) {

  /* Downstream ports enforce device number 0. */
@@ -694,7 +694,8 @@ void pcie_cap_slot_reset(PCIDevice *dev)
 PCI_EXP_SLTCTL_PCC);
  }
  
-pic = populated ? PCI_EXP_SLTCTL_PIC_ON : PCI_EXP_SLTCTL_PIC_OFF;

+pic = populated ?
+PCI_EXP_SLTCTL_PWR_IND_ON : PCI_EXP_SLTCTL_PWR_IND_OFF;
  pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL, pic);
  }
  
@@ -770,9 +771,9 @@ void pcie_cap_slot_write_config(PCIDevice *dev,

   * control of powered off slots before powering them on.
   */
  if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
-(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PIC_OFF &&
+(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF &&
  (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
-(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PIC_OFF)) {
+(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PWR_IND_OFF)) {
  pcie_cap_slot_do_unplug(dev);
  }
  pcie_cap_update_power(dev);




Re: [PATCH v4 07/16] pcie: pcie_cap_slot_write_config(): use correct macro

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

PCI_EXP_SLTCTL_PIC_OFF is a value, and PCI_EXP_SLTCTL_PIC is a mask.
Happily PCI_EXP_SLTCTL_PIC_OFF is a maximum value for this mask and is
equal to the mask itself. Still the code looks like a bug. Let's make
it more reader-friendly.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 
---
  hw/pci/pcie.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 924fdabd15..82ef723983 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -770,9 +770,9 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
   * control of powered off slots before powering them on.
   */
  if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
-(val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF &&
+(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PIC_OFF &&
  (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
-(old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
+(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PIC_OFF)) {
  pcie_cap_slot_do_unplug(dev);
      }
  pcie_cap_update_power(dev);

Reviewed-by: Anton Kuchin 



Re: [PATCH v4 06/16] pci/shpc: refactor shpc_device_plug_common()

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

Rename it to shpc_device_get_slot(), to mention what it does rather
than how it is used. It also helps to reuse it in further commit.

Also, add a return value and get rid of local_err.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/pci/shpc.c | 19 ---
  1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index 9f964b1d70..e7bc7192f1 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -496,8 +496,9 @@ static const MemoryRegionOps shpc_mmio_ops = {
  .max_access_size = 4,
  },
  };
-static void shpc_device_plug_common(PCIDevice *affected_dev, int *slot,
-SHPCDevice *shpc, Error **errp)
+
+static bool shpc_device_get_slot(PCIDevice *affected_dev, int *slot,
+ SHPCDevice *shpc, Error **errp)
  {
  int pci_slot = PCI_SLOT(affected_dev->devfn);
  *slot = SHPC_PCI_TO_IDX(pci_slot);
@@ -507,21 +508,20 @@ static void shpc_device_plug_common(PCIDevice 
*affected_dev, int *slot,
 "controller. Valid slots are between %d and %d.",
 pci_slot, SHPC_IDX_TO_PCI(0),
 SHPC_IDX_TO_PCI(shpc->nslots) - 1);
-return;
+return false;
  }
+
+return true;
  }
  
  void shpc_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,

  Error **errp)
  {
-Error *local_err = NULL;
  PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev);
  SHPCDevice *shpc = pci_hotplug_dev->shpc;
  int slot;
  
-shpc_device_plug_common(PCI_DEVICE(dev), &slot, shpc, &local_err);

-if (local_err) {
-error_propagate(errp, local_err);
+if (!shpc_device_get_slot(PCI_DEVICE(dev), &slot, shpc, errp)) {
  return;
  }
  
@@ -563,16 +563,13 @@ void shpc_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,

  void shpc_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
  {
-Error *local_err = NULL;
  PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev);
  SHPCDevice *shpc = pci_hotplug_dev->shpc;
  uint8_t state;
  uint8_t led;
  int slot;
  
-shpc_device_plug_common(PCI_DEVICE(dev), &slot, shpc, &local_err);

-if (local_err) {
-error_propagate(errp, local_err);
+if (!shpc_device_get_slot(PCI_DEVICE(dev), &slot, shpc, errp)) {
      return;
  }
  

Reviewed-by: Anton Kuchin 



Re: [PATCH v4 05/16] pci/shpc: pass PCIDevice pointer to shpc_slot_command()

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

We'll need it in further patch to report bridge in QAPI event.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/pci/shpc.c | 18 ++
  1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index 959dc470f3..9f964b1d70 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -263,9 +263,10 @@ static bool shpc_slot_is_off(uint8_t state, uint8_t power, 
uint8_t attn)
  return state == SHPC_STATE_DISABLED && power == SHPC_LED_OFF;
  }
  
-static void shpc_slot_command(SHPCDevice *shpc, uint8_t target,

+static void shpc_slot_command(PCIDevice *d, uint8_t target,
uint8_t state, uint8_t power, uint8_t attn)
  {
+SHPCDevice *shpc = d->shpc;
  int slot = SHPC_LOGICAL_TO_IDX(target);
  uint8_t old_state = shpc_get_status(shpc, slot, SHPC_SLOT_STATE_MASK);
  uint8_t old_power = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK);
@@ -314,8 +315,9 @@ static void shpc_slot_command(SHPCDevice *shpc, uint8_t 
target,
  }
  }
  
-static void shpc_command(SHPCDevice *shpc)

+static void shpc_command(PCIDevice *d)
  {
+SHPCDevice *shpc = d->shpc;
  uint8_t code = pci_get_byte(shpc->config + SHPC_CMD_CODE);
  uint8_t speed;
  uint8_t target;
@@ -336,7 +338,7 @@ static void shpc_command(SHPCDevice *shpc)
  state = (code & SHPC_SLOT_STATE_MASK) >> SHPC_SLOT_STATE_SHIFT;
  power = (code & SHPC_SLOT_PWR_LED_MASK) >> SHPC_SLOT_PWR_LED_SHIFT;
  attn = (code & SHPC_SLOT_ATTN_LED_MASK) >> SHPC_SLOT_ATTN_LED_SHIFT;
-shpc_slot_command(shpc, target, state, power, attn);
+shpc_slot_command(d, target, state, power, attn);
  break;
  case 0x40 ... 0x47:
  speed = code & SHPC_SEC_BUS_MASK;
@@ -354,10 +356,10 @@ static void shpc_command(SHPCDevice *shpc)
  }
  for (i = 0; i < shpc->nslots; ++i) {
  if (!(shpc_get_status(shpc, i, SHPC_SLOT_STATUS_MRL_OPEN))) {
-shpc_slot_command(shpc, i + SHPC_CMD_TRGT_MIN,
+shpc_slot_command(d, i + SHPC_CMD_TRGT_MIN,
SHPC_STATE_PWRONLY, SHPC_LED_ON, 
SHPC_LED_NO);
  } else {
-shpc_slot_command(shpc, i + SHPC_CMD_TRGT_MIN,
+shpc_slot_command(d, i + SHPC_CMD_TRGT_MIN,
SHPC_STATE_NO, SHPC_LED_OFF, SHPC_LED_NO);
  }
  }
@@ -375,10 +377,10 @@ static void shpc_command(SHPCDevice *shpc)
  }
  for (i = 0; i < shpc->nslots; ++i) {
  if (!(shpc_get_status(shpc, i, SHPC_SLOT_STATUS_MRL_OPEN))) {
-shpc_slot_command(shpc, i + SHPC_CMD_TRGT_MIN,
+shpc_slot_command(d, i + SHPC_CMD_TRGT_MIN,
SHPC_STATE_ENABLED, SHPC_LED_ON, 
SHPC_LED_NO);
  } else {
-shpc_slot_command(shpc, i + SHPC_CMD_TRGT_MIN,
+shpc_slot_command(d, i + SHPC_CMD_TRGT_MIN,
SHPC_STATE_NO, SHPC_LED_OFF, SHPC_LED_NO);
  }
  }
@@ -410,7 +412,7 @@ static void shpc_write(PCIDevice *d, unsigned addr, 
uint64_t val, int l)
  shpc->config[a] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
  }
  if (ranges_overlap(addr, l, SHPC_CMD_CODE, 2)) {
-shpc_command(shpc);
+shpc_command(d);
  }
  shpc_interrupt_update(d);
  }

Reviewed-by: Anton Kuchin 



Re: [PATCH v4 04/16] pci/shpc: more generic handle hot-unplug in shpc_slot_command()

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

Free slot if both conditions (power-led = OFF and state = DISABLED)
becomes true regardless of the sequence. It is similar to how PCIe
hotplug works.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/pci/shpc.c | 52 ++-
  1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index 25e4172382..959dc470f3 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -258,49 +258,59 @@ static void shpc_free_devices_in_slot(SHPCDevice *shpc, 
int slot)
  }
  }
  
+static bool shpc_slot_is_off(uint8_t state, uint8_t power, uint8_t attn)

+{
+return state == SHPC_STATE_DISABLED && power == SHPC_LED_OFF;
+}
+
  static void shpc_slot_command(SHPCDevice *shpc, uint8_t target,
uint8_t state, uint8_t power, uint8_t attn)
  {
-uint8_t current_state;
  int slot = SHPC_LOGICAL_TO_IDX(target);
+uint8_t old_state = shpc_get_status(shpc, slot, SHPC_SLOT_STATE_MASK);
+uint8_t old_power = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK);
+uint8_t old_attn = shpc_get_status(shpc, slot, SHPC_SLOT_ATTN_LED_MASK);
+
  if (target < SHPC_CMD_TRGT_MIN || slot >= shpc->nslots) {
  shpc_invalid_command(shpc);
  return;
  }
-current_state = shpc_get_status(shpc, slot, SHPC_SLOT_STATE_MASK);
-if (current_state == SHPC_STATE_ENABLED && state == SHPC_STATE_PWRONLY) {
+
+if (old_state == SHPC_STATE_ENABLED && state == SHPC_STATE_PWRONLY) {
  shpc_invalid_command(shpc);
  return;
  }
  
-if (power != SHPC_LED_NO) {

+if (power == SHPC_LED_NO) {
+power = old_power;
+} else {
  /* TODO: send event to monitor */
  shpc_set_status(shpc, slot, power, SHPC_SLOT_PWR_LED_MASK);
  }
-if (attn != SHPC_LED_NO) {
+
+if (attn == SHPC_LED_NO) {
+attn = old_attn;
+} else {
  /* TODO: send event to monitor */
  shpc_set_status(shpc, slot, attn, SHPC_SLOT_ATTN_LED_MASK);
  }
-if (state != SHPC_STATE_NO) {
+
+if (state == SHPC_STATE_NO) {
+state = old_state;
+} else {
  shpc_set_status(shpc, slot, state, SHPC_SLOT_STATE_MASK);
  }
  
-if ((current_state == SHPC_STATE_ENABLED ||

- current_state == SHPC_STATE_PWRONLY) &&
-state == SHPC_STATE_DISABLED)
+if (!shpc_slot_is_off(old_state, old_power, old_attn) &&
+shpc_slot_is_off(state, power, attn))
  {
-power = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK);
-/* TODO: track what monitor requested. */
-/* Look at LED to figure out whether it's ok to remove the device. */
-if (power == SHPC_LED_OFF) {
-shpc_free_devices_in_slot(shpc, slot);
-shpc_set_status(shpc, slot, 1, SHPC_SLOT_STATUS_MRL_OPEN);
-shpc_set_status(shpc, slot, SHPC_SLOT_STATUS_PRSNT_EMPTY,
-SHPC_SLOT_STATUS_PRSNT_MASK);
-shpc->config[SHPC_SLOT_EVENT_LATCH(slot)] |=
-SHPC_SLOT_EVENT_MRL |
-SHPC_SLOT_EVENT_PRESENCE;
-}
+shpc_free_devices_in_slot(shpc, slot);
+shpc_set_status(shpc, slot, 1, SHPC_SLOT_STATUS_MRL_OPEN);
+shpc_set_status(shpc, slot, SHPC_SLOT_STATUS_PRSNT_EMPTY,
+SHPC_SLOT_STATUS_PRSNT_MASK);
+shpc->config[SHPC_SLOT_EVENT_LATCH(slot)] |=
+SHPC_SLOT_EVENT_MRL |
+    SHPC_SLOT_EVENT_PRESENCE;
  }
  }
  

Reviewed-by: Anton Kuchin 



Re: [PATCH v4 03/16] pci/shpc: shpc_slot_command(): handle PWRONLY -> ENABLED transition

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

ENABLED -> PWRONLY transition is not allowed and we handle it by
shpc_invalid_command(). But PWRONLY -> ENABLED transition is silently
ignored, which seems wrong. Let's handle it as correct.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/pci/shpc.c | 24 +---
  1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index 5d71569b13..25e4172382 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -273,28 +273,22 @@ static void shpc_slot_command(SHPCDevice *shpc, uint8_t 
target,
  return;
  }
  
-switch (power) {

-case SHPC_LED_NO:
-break;
-default:
+if (power != SHPC_LED_NO) {
  /* TODO: send event to monitor */
  shpc_set_status(shpc, slot, power, SHPC_SLOT_PWR_LED_MASK);
  }
-switch (attn) {
-case SHPC_LED_NO:
-break;
-default:
+if (attn != SHPC_LED_NO) {
  /* TODO: send event to monitor */
  shpc_set_status(shpc, slot, attn, SHPC_SLOT_ATTN_LED_MASK);
  }
-
-if ((current_state == SHPC_STATE_DISABLED && state == SHPC_STATE_PWRONLY) 
||
-(current_state == SHPC_STATE_DISABLED && state == SHPC_STATE_ENABLED)) 
{
-shpc_set_status(shpc, slot, state, SHPC_SLOT_STATE_MASK);
-} else if ((current_state == SHPC_STATE_ENABLED ||
-current_state == SHPC_STATE_PWRONLY) &&
-   state == SHPC_STATE_DISABLED) {
+if (state != SHPC_STATE_NO) {
  shpc_set_status(shpc, slot, state, SHPC_SLOT_STATE_MASK);
+}
+
+if ((current_state == SHPC_STATE_ENABLED ||
+ current_state == SHPC_STATE_PWRONLY) &&
+state == SHPC_STATE_DISABLED)
+{
  power = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK);
  /* TODO: track what monitor requested. */
  /* Look at LED to figure out whether it's ok to remove the device. */

Reviewed-by: Anton Kuchin 



Re: [PATCH v4 02/16] pci/shpc: change shpc_get_status() return type to uint8_t

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

The result of the function is always one byte. The result is always
assigned to uint8_t variable. Also, shpc_get_status() should be
symmetric to shpc_set_status() which has uint8_t value argument.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/pci/shpc.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index 1b3f619dc9..5d71569b13 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -123,10 +123,13 @@
  #define SHPC_PCI_TO_IDX(pci_slot) ((pci_slot) - 1)
  #define SHPC_IDX_TO_PHYSICAL(slot) ((slot) + 1)
  
-static uint16_t shpc_get_status(SHPCDevice *shpc, int slot, uint16_t msk)

+static uint8_t shpc_get_status(SHPCDevice *shpc, int slot, uint16_t msk)
  {
  uint8_t *status = shpc->config + SHPC_SLOT_STATUS(slot);
-return (pci_get_word(status) & msk) >> ctz32(msk);
+uint16_t result = (pci_get_word(status) & msk) >> ctz32(msk);
+
+assert(result <= UINT8_MAX);
+return result;
  }
  
  static void shpc_set_status(SHPCDevice *shpc,

Reviewed-by: Anton Kuchin 



Re: [PATCH v4 01/16] pci/shpc: set attention led to OFF on reset

2023-02-14 Thread Anton Kuchin

On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote:

0 is not a valid state for the led. Let's start with OFF.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/pci/shpc.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index fca7f6691a..1b3f619dc9 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -223,6 +223,7 @@ void shpc_reset(PCIDevice *d)
  SHPC_SLOT_STATUS_PRSNT_MASK);
  shpc_set_status(shpc, i, SHPC_LED_OFF, SHPC_SLOT_PWR_LED_MASK);
  }
+shpc_set_status(shpc, i, SHPC_LED_OFF, SHPC_SLOT_ATTN_LED_MASK);
  shpc_set_status(shpc, i, 0, SHPC_SLOT_STATUS_66);
  }
  shpc_set_sec_bus_speed(shpc, SHPC_SEC_BUS_33);

Reviewed-by: Anton Kuchin 



Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-02-10 Thread Anton Kuchin

On 02/02/2023 11:59, Juan Quintela wrote:

Anton Kuchin  wrote:

On 01/02/2023 16:26, Juan Quintela wrote:

Anton Kuchin  wrote:

On 19/01/2023 18:02, Stefan Hajnoczi wrote:

On Thu, 19 Jan 2023 at 10:29, Anton Kuchin  wrote:

On 19/01/2023 16:30, Stefan Hajnoczi wrote:

On Thu, 19 Jan 2023 at 07:43, Anton Kuchin  wrote:

On 18/01/2023 17:52, Stefan Hajnoczi wrote:

On Sun, 15 Jan 2023 at 12:21, Anton Kuchin  wrote:

Once told that, I think that you are making your live harder in the
future when you add the other migratable devices.

I am assuming here that your "underlying device" is:

enum VhostUserFSType {
  VHOST_USER_NO_MIGRATABLE = 0,
  // The one we are doing here
  VHOST_USER_EXTERNAL_MIGRATABLE,
  // The one you describe for the future
  VHOST_USER_INTERNAL_MIGRATABLE,
};

struct VHostUserFS {
  /*< private >*/
  VirtIODevice parent;
  VHostUserFSConf conf;
  struct vhost_virtqueue *vhost_vqs;
  struct vhost_dev vhost_dev;
  VhostUserState vhost_user;
  VirtQueue **req_vqs;
  VirtQueue *hiprio_vq;
  int32_t bootindex;
  enum migration_type;
  /*< public >*/
};


static int vhost_user_fs_pre_save(void *opaque)
{
  VHostUserFS *s = opaque;

  if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) {
  error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
   "state of backend to be preserved. If orchestrator can "
   "guarantee this (e.g. dst connects to the same backend "
   "instance or backend state is migrated) set 'vhost-user-fs' 
"
   "migration capability to true to enable migration.");
  return -1;
  }
  if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) {
  return 0;
  }
  if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) {
  error_report("still not implemented");
  return -1;
  }
  assert("we don't reach here");
}

Your initial vmstateDescription

static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
  .unmigratable = 1,
  .minimum_version_id = 0,
  .version_id = 0,
  .fields = (VMStateField[]) {
  VMSTATE_INT8(migration_type, struct VHostUserFS),
  VMSTATE_VIRTIO_DEVICE,
  VMSTATE_END_OF_LIST()
  },
  .pre_save = vhost_user_fs_pre_save,
};

And later you change it to something like:

static bool vhost_fs_user_internal_state_needed(void *opaque)
{
  VHostUserFS *s = opaque;

  return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE;
}

static const VMStateDescription vmstate_vhost_user_fs_internal_sub = {
  .name = "vhost-user-fs/internal",
  .version_id = 1,
  .minimum_version_id = 1,
  .needed = &vhost_fs_user_internal_state_needed,
  .fields = (VMStateField[]) {
   // Whatever
  VMSTATE_END_OF_LIST()
  }
};

static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
  .minimum_version_id = 0,
  .version_id = 0,
  .fields = (VMStateField[]) {
  VMSTATE_INT8(migration_type, struct VHostUserFS),
  VMSTATE_VIRTIO_DEVICE,
  VMSTATE_END_OF_LIST()
  },
  .pre_save = vhost_user_fs_pre_save,
  .subsections = (const VMStateDescription*[]) {
  &vmstate_vhost_user_fs_internal_sub,
  NULL
  }
};

And you are done.

I will propose to use a property to set migration_type, but I didn't
want to write the code right now.


I have a little problem with implementation here and need an advice:

Can we make this property adjustable at runtime after device was realized?
There is a statement in qemu wiki [1] that makes me think this is possible
but I couldn't find any code for it or example in other devices.
> "Properties are, by default, locked while the device is realized. 
Exceptions
> can be made by the devices themselves in order to implement a way for 
a user

> to interact with a device while it is realized."

Or is your idea just to set this property once at construction and keep it
constant for device lifetime?

[1] https://wiki.qemu.org/Features/QOM



I think that this proposal will make Stephan happy, and it is just
adding and extra uint8_t that is helpul to implement everything.

That is exactly the approach I'm trying to implement it right now. Single
flag can be convenient for orchestrator but makes it too hard to account in
all cases for all devices on qemu side without breaking future
compatibility.
So I'm rewriting it with properties.

Nice.  That would be my proposal.  Just a bit complicated for a proof of 
concetp.


BTW do you think each vhost-user device should have its own enum of
migration
types or maybe we could make them common for all 

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-02-01 Thread Anton Kuchin

On 01/02/2023 16:26, Juan Quintela wrote:

Anton Kuchin  wrote:

On 19/01/2023 18:02, Stefan Hajnoczi wrote:

On Thu, 19 Jan 2023 at 10:29, Anton Kuchin  wrote:

On 19/01/2023 16:30, Stefan Hajnoczi wrote:

On Thu, 19 Jan 2023 at 07:43, Anton Kuchin  wrote:

On 18/01/2023 17:52, Stefan Hajnoczi wrote:

On Sun, 15 Jan 2023 at 12:21, Anton Kuchin  wrote:

Hi

Sorry to come so late into the discussion.


You are just in time, I'm working on v2 of this patch right now :-)




+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");
+return -1;
+}
+
+return 0;
+}
+
 static const VMStateDescription vuf_vmstate = {
 .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
 };

I don't object to extend the vmstate this way.

But I object to the migration capability for several reasons:
- This feature has _nothing_ to do with migration, the problem, what it
   describes, etc is related to vhost_user_fs.
- The number of migration capabilities is limited
   And we add checks to see if they are valid, consistent, etc
   see qemu/migration/migration.c:migrate_caps_check()
- As Stefan says, we can have several vhost_user_fs devices, and each
   one should know if it can migrate or not.
- We have to options for the orchestator:
   * it knows that all the vhost_user_fs devices can be migration
 Then it can add a property to each vhost_user_fs device
   * it don't know it
 Then it is a good idea that we are not migrating this VM.


I think we'd be better without a new marker because migration code
has standard generic way of solving such puzzles that I described
above. So adding new marker would go against existing practice.
But if you could show me where I missed something I'll be grateful
and will fix it to avoid potential problems.
I'd also be happy to know the opinion of Dr. David Alan Gilbert.

If everybody agrees that any vhost_user_fs device is going to have a
virtio device, then I agree with you that the marker is not needed at
this point.

Once told that, I think that you are making your live harder in the
future when you add the other migratable devices.

I am assuming here that your "underlying device" is:

enum VhostUserFSType {
 VHOST_USER_NO_MIGRATABLE = 0,
 // The one we are doing here
 VHOST_USER_EXTERNAL_MIGRATABLE,
 // The one you describe for the future
 VHOST_USER_INTERNAL_MIGRATABLE,
};

struct VHostUserFS {
 /*< private >*/
 VirtIODevice parent;
 VHostUserFSConf conf;
 struct vhost_virtqueue *vhost_vqs;
 struct vhost_dev vhost_dev;
 VhostUserState vhost_user;
 VirtQueue **req_vqs;
 VirtQueue *hiprio_vq;
 int32_t bootindex;
 enum migration_type;
 /*< public >*/
};


static int vhost_user_fs_pre_save(void *opaque)
{
 VHostUserFS *s = opaque;

 if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) {
 error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
  "state of backend to be preserved. If orchestrator can "
  "guarantee this (e.g. dst connects to the same backend "
  "instance or backend state is migrated) set 'vhost-user-fs' 
"
  "migration capability to true to enable migration.");
 return -1;
 }
 if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) {
 return 0;
 }
 if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) {
 error_report("still not implemented");
 return -1;
 }
 assert("we don't reach here");
}

Your initial vmstateDescription

static const VMStateDescription vuf_vmstate = {
 .name = "vhost-user-fs",
 .unmigratable = 1,
 .minimum_version_id = 0,
 .version_id = 0,
 .fields = (VMStateField[]) {
 VMSTATE_INT8(migration_type, struct VHostUserFS),
 VMSTATE_VIRTIO_DEVICE,
 VMSTATE_END_OF_LIST()
 },
 .pre_save = vhost_user_fs_pre_save,
};

And later you change it to something like:

static bool vhost_fs_user_internal_state_

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-26 Thread Anton Kuchin

On 26/01/2023 17:13, Stefan Hajnoczi wrote:

On Thu, 26 Jan 2023 at 09:20, Anton Kuchin  wrote:

On 25/01/2023 21:46, Stefan Hajnoczi wrote:

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
   hw/virtio/vhost-user-fs.c | 25 -
   qapi/migration.json   |  7 ++-
   2 files changed, 30 insertions(+), 2 deletions(-)

Hi Anton,
Sorry for holding up your work with the discussions that we had. I still
feel it's important to agree on command-line and/or vhost-user protocol
changes that will be able to support non-migratable, stateless
migration/reconnect, and stateful migration vhost-user-fs back-ends. All
three will exist.

As a next step, could you share your code that implements the QEMU side
of stateless migration?

I think that will make it clearer whether a command-line option
(migration capability or per-device) is sufficient or whether the
vhost-user protocol needs to be extended.

If the vhost-user protocol is extended then maybe no user-visible
changes are necessary. QEMU will know if the vhost-user-fs backend
supports migration and which type of migration. It can block migration
in cases where it's not possible.

Thanks,
Stefan


Thank you, Stefan,

That's OK. The discussion is very helpful and showed me some parts
that should to be checked to make sure no harm is done by this feature.
I needed some time to step back, review my approach to this feature
with all valuable feedback and ideas that were suggested and check
what other backend implementations can or can't do.
I'll answer today the emails with questions that were addressed to me.

This is all the code that QEMU needs to support stateless migration.
Do you mean backend and/or orchestrator parts?

It's unclear to me how the destination QEMU is able to connect to the
vhost-user back-end while the source QEMU is still connected? I
thought additional QEMU changes would be necessary to make migration
handover work.

Stefan


Oh, yes, that is exactly the part I'm checking right now: I was
testing the scenario when vm is saved to file and then restored from
file without host and destination running at the same time.




Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-26 Thread Anton Kuchin

On 25/01/2023 21:46, Stefan Hajnoczi wrote:

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
  hw/virtio/vhost-user-fs.c | 25 -
  qapi/migration.json   |  7 ++-
  2 files changed, 30 insertions(+), 2 deletions(-)

Hi Anton,
Sorry for holding up your work with the discussions that we had. I still
feel it's important to agree on command-line and/or vhost-user protocol
changes that will be able to support non-migratable, stateless
migration/reconnect, and stateful migration vhost-user-fs back-ends. All
three will exist.

As a next step, could you share your code that implements the QEMU side
of stateless migration?

I think that will make it clearer whether a command-line option
(migration capability or per-device) is sufficient or whether the
vhost-user protocol needs to be extended.

If the vhost-user protocol is extended then maybe no user-visible
changes are necessary. QEMU will know if the vhost-user-fs backend
supports migration and which type of migration. It can block migration
in cases where it's not possible.

Thanks,
Stefan



Thank you, Stefan,

That's OK. The discussion is very helpful and showed me some parts
that should to be checked to make sure no harm is done by this feature.
I needed some time to step back, review my approach to this feature
with all valuable feedback and ideas that were suggested and check
what other backend implementations can or can't do.
I'll answer today the emails with questions that were addressed to me.

This is all the code that QEMU needs to support stateless migration.
Do you mean backend and/or orchestrator parts?

I'll think of how protocol extension can help us make this feature
transparent to users.




Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-23 Thread Anton Kuchin



On 23/01/2023 16:09, Stefan Hajnoczi wrote:

On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin  wrote:

On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:

On 22/01/2023 16:46, Michael S. Tsirkin wrote:

On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:

This flag should be set when qemu don't need to worry about any
external state stored in vhost-user daemons during migration:
don't fail migration, just pack generic virtio device states to
migration stream and orchestrator guarantees that the rest of the
state will be present at the destination to restore full context and
continue running.

Sorry  I still do not get it.  So fundamentally, why do we need this property?
vhost-user-fs is not created by default that we'd then need opt-in to
the special "migrateable" case.
That's why I said it might make some sense as a device property as qemu
tracks whether device is unplugged for us.

But as written, if you are going to teach the orchestrator about
vhost-user-fs and its special needs, just teach it when to migrate and
where not to migrate.

Either we describe the special situation to qemu and let qemu
make an intelligent decision whether to allow migration,
or we trust the orchestrator. And if it's the latter, then 'migrate'
already says orchestrator decided to migrate.

The problem I'm trying to solve is that most of vhost-user devices
now block migration in qemu. And this makes sense since qemu can't
extract and transfer backend daemon state. But this prevents us from
updating qemu executable via local migration. So this flag is
intended more as a safety check that says "I know what I'm doing".

I agree that it is not really necessary if we trust the orchestrator
to request migration only when the migration can be performed in a
safe way. But changing the current behavior of vhost-user-fs from
"always blocks migration" to "migrates partial state whenever
orchestrator requests it" seems a little  dangerous and can be
misinterpreted as full support for migration in all cases.

It's not really different from block is it? orchestrator has to arrange
for backend migration. I think we just assumed there's no use-case where
this is practical for vhost-user-fs so we blocked it.
But in any case it's orchestrator's responsibility.

Yes, you are right. So do you think we should just drop the blocker
without adding a new flag?

I'd be inclined to. I am curious what do dgilbert and stefanha think though.

If the migration blocker is removed, what happens when a user attempts
to migrate with a management tool and/or vhost-user-fs server
implementation that don't support migration?


There will be no matching fuse-session in destination endpoint so all
requests to this fs will fail until it is remounted from guest to
send new FUSE_INIT message that does session setup.



Anton: Can you explain how stateless migration will work on the
vhost-user-fs back-end side? Is it reusing vhost-user reconnect
functionality or introducing a new mode for stateless migration? I
guess the vhost-user-fs back-end implementation is required to
implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all
in-flight requests when vrings are stopped?


It reuses existing vhost-user reconnect code to resubmit inflight
requests.
Sure, backend needs to support this feature - presence of required
features is checked by generic vhost and vhost-user code during init
and if something is missing migration blocker is assigned to the
device (not a static one in vmstate that I remove in this patch, but
other per-device kind of blocker).



Stefan




Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-22 Thread Anton Kuchin



On 22/01/2023 16:46, Michael S. Tsirkin wrote:

On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:

This flag should be set when qemu don't need to worry about any
external state stored in vhost-user daemons during migration:
don't fail migration, just pack generic virtio device states to
migration stream and orchestrator guarantees that the rest of the
state will be present at the destination to restore full context and
continue running.

Sorry  I still do not get it.  So fundamentally, why do we need this property?
vhost-user-fs is not created by default that we'd then need opt-in to
the special "migrateable" case.
That's why I said it might make some sense as a device property as qemu
tracks whether device is unplugged for us.

But as written, if you are going to teach the orchestrator about
vhost-user-fs and its special needs, just teach it when to migrate and
where not to migrate.

Either we describe the special situation to qemu and let qemu
make an intelligent decision whether to allow migration,
or we trust the orchestrator. And if it's the latter, then 'migrate'
already says orchestrator decided to migrate.

The problem I'm trying to solve is that most of vhost-user devices
now block migration in qemu. And this makes sense since qemu can't
extract and transfer backend daemon state. But this prevents us from
updating qemu executable via local migration. So this flag is
intended more as a safety check that says "I know what I'm doing".

I agree that it is not really necessary if we trust the orchestrator
to request migration only when the migration can be performed in a
safe way. But changing the current behavior of vhost-user-fs from
"always blocks migration" to "migrates partial state whenever
orchestrator requests it" seems a little  dangerous and can be
misinterpreted as full support for migration in all cases.

It's not really different from block is it? orchestrator has to arrange
for backend migration. I think we just assumed there's no use-case where
this is practical for vhost-user-fs so we blocked it.
But in any case it's orchestrator's responsibility.


Yes, you are right. So do you think we should just drop the blocker
without adding a new flag?




Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-22 Thread Anton Kuchin

On 22/01/2023 10:16, Michael S. Tsirkin wrote:

On Fri, Jan 20, 2023 at 07:37:18PM +0200, Anton Kuchin wrote:

On 20/01/2023 15:58, Michael S. Tsirkin wrote:

On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote:

On 19/01/2023 14:51, Michael S. Tsirkin wrote:

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
hw/virtio/vhost-user-fs.c | 25 -
qapi/migration.json   |  7 ++-
2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
#include "hw/virtio/vhost-user-fs.h"
#include "monitor/monitor.h"
#include "sysemu/sysemu.h"
+#include "migration/migration.h"
static const int user_feature_bits[] = {
VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
return &fs->vhost_dev;
}
+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");

Isn't it possible that some backends are same and some are not?
Shouldn't this be a device property then?

If some are not the same it is not guaranteed that correct FUSE
state is present there, so orchestrator shouldn't set the capability
because this can result in destination devices being broken (they'll
be fine after the remount in guest, but this is guest visible and is
not acceptable).

I can imagine smart orchestrator and backend that can transfer
internal FUSE state, but we are not there yet, and this would be
their responsibility then to ensure endpoint compatibility between src
and dst and set the capability (that's why I put "e.g." and "or" in
the error description).

So instead of relying on the orchestrator how about making it a device
property?

We have to rely on the orchestrator here and I can't see how a property
can help us with this: same device can be migratable or not depending
on if the destination is the same host, what features backend supports,
how management software works and other factors of environment that
are not accessible from qemu or backend daemon.
So in the end we'll end up with orchestrator having to setup flags for
each device before each migration based on information only it can
have - in my opinion this is worse than just giving the orchestrator
a single flag that it can set after calculating the decision for
the particular migration that it organizes.






+return -1;
+}
+
+return 0;
+}
+
static const VMStateDescription vuf_vmstate = {
.name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
};
static Property vuf_properties[] = {
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
#will be handled faster.  This is a performance feature 
and
#should not affect the correctness of postcopy 
migration.
#(since 7.1)
+# @vhost-user-fs: If enabled, the migration process will allow migration of
+# vhost-user-fs devices, this should be enabled only when
+# backend can preserve local FUSE state e.g. for qemu update
+# when dst reconects to the same endpoints after migration.
+# (since 8.0)
#
# Features:
# @unstable: Members @x-colo and @x-ignore-shar

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-20 Thread Anton Kuchin

On 20/01/2023 15:58, Michael S. Tsirkin wrote:

On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote:

On 19/01/2023 14:51, Michael S. Tsirkin wrote:

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
   hw/virtio/vhost-user-fs.c | 25 -
   qapi/migration.json   |  7 ++-
   2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
   #include "hw/virtio/vhost-user-fs.h"
   #include "monitor/monitor.h"
   #include "sysemu/sysemu.h"
+#include "migration/migration.h"
   static const int user_feature_bits[] = {
   VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
   return &fs->vhost_dev;
   }
+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");

Isn't it possible that some backends are same and some are not?
Shouldn't this be a device property then?

If some are not the same it is not guaranteed that correct FUSE
state is present there, so orchestrator shouldn't set the capability
because this can result in destination devices being broken (they'll
be fine after the remount in guest, but this is guest visible and is
not acceptable).

I can imagine smart orchestrator and backend that can transfer
internal FUSE state, but we are not there yet, and this would be
their responsibility then to ensure endpoint compatibility between src
and dst and set the capability (that's why I put "e.g." and "or" in
the error description).

So instead of relying on the orchestrator how about making it a device
property?


We have to rely on the orchestrator here and I can't see how a property
can help us with this: same device can be migratable or not depending
on if the destination is the same host, what features backend supports,
how management software works and other factors of environment that
are not accessible from qemu or backend daemon.
So in the end we'll end up with orchestrator having to setup flags for
each device before each migration based on information only it can
have - in my opinion this is worse than just giving the orchestrator
a single flag that it can set after calculating the decision for
the particular migration that it organizes.








+return -1;
+}
+
+return 0;
+}
+
   static const VMStateDescription vuf_vmstate = {
   .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
   };
   static Property vuf_properties[] = {
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
   #will be handled faster.  This is a performance feature 
and
   #should not affect the correctness of postcopy migration.
   #(since 7.1)
+# @vhost-user-fs: If enabled, the migration process will allow migration of
+# vhost-user-fs devices, this should be enabled only when
+# backend can preserve local FUSE state e.g. for qemu update
+# when dst reconects to the same endpoints after migration.
+# (since 8.0)
   #
   # Features:
   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -492,7 +497,7 @@
  'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activat

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-19 Thread Anton Kuchin



On 19/01/2023 21:00, Dr. David Alan Gilbert wrote:

* Anton Kuchin (antonkuc...@yandex-team.ru) wrote:

On 19/01/2023 14:51, Michael S. Tsirkin wrote:

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
   hw/virtio/vhost-user-fs.c | 25 -
   qapi/migration.json   |  7 ++-
   2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
   #include "hw/virtio/vhost-user-fs.h"
   #include "monitor/monitor.h"
   #include "sysemu/sysemu.h"
+#include "migration/migration.h"
   static const int user_feature_bits[] = {
   VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
   return &fs->vhost_dev;
   }
+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");

Isn't it possible that some backends are same and some are not?
Shouldn't this be a device property then?

If some are not the same it is not guaranteed that correct FUSE
state is present there, so orchestrator shouldn't set the capability
because this can result in destination devices being broken (they'll
be fine after the remount in guest, but this is guest visible and is
not acceptable).

Shouldn't this be something that's negotiated as a feature bit in the
vhost-user protocol - i.e. to say whether the device can do migration?

However, I think what you're saying is that a migration might only be
doable in some cases - e.g. a local migration - and that's hard for
either qemu or the daemon to discover by itself; so it's right that
an orchestrator enables it.
(I'm not sure the error_report message needs to be quite so verbose -
normally a one line clear message is enough that people can then look
up).



Exactly, migration depends not only on device capabilities but also
on backends, hosts, management and other environment not known to
qemu or backend so I can't see how this can be device config or
negotiated via the protocol.

In the message I tried to make clear that just enabling the capability
without proper setup can be dangerous to reduce temptation to use it
recklessly and get broken virtiofs device later. I'll try to shorten
the message. I would appreciate if you could help me with the wording
(I'm not a native speaker so building a short sentence with proper
level of warning is challenging)




I can imagine smart orchestrator and backend that can transfer
internal FUSE state, but we are not there yet, and this would be
their responsibility then to ensure endpoint compatibility between src
and dst and set the capability (that's why I put "e.g." and "or" in
the error description).

You also need the vhost-user device to be able to do the dirty bitmap
updates; is that being checked somewhere?

Dave


Yes, this is done by generic vhost and vhost-user code on device init.
In vhost_user_backend_init function if backend doesn't support
VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature migration blocker
is assigned to device. And in vhost_dev_init there is one more check
for VHOST_F_LOG_ALL feature that assigns another migration blocker if
this is not supported.







+return -1;
+}
+
+return 0;
+}
+
   static const VMStateDescription vuf_vmstate = {
   .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
   };
   static Property vuf_propertie

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-19 Thread Anton Kuchin

On 19/01/2023 18:02, Stefan Hajnoczi wrote:

On Thu, 19 Jan 2023 at 10:29, Anton Kuchin  wrote:

On 19/01/2023 16:30, Stefan Hajnoczi wrote:

On Thu, 19 Jan 2023 at 07:43, Anton Kuchin  wrote:

On 18/01/2023 17:52, Stefan Hajnoczi wrote:

On Sun, 15 Jan 2023 at 12:21, Anton Kuchin  wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
hw/virtio/vhost-user-fs.c | 25 -
qapi/migration.json   |  7 ++-
2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
#include "hw/virtio/vhost-user-fs.h"
#include "monitor/monitor.h"
#include "sysemu/sysemu.h"
+#include "migration/migration.h"

static const int user_feature_bits[] = {
VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
return &fs->vhost_dev;
}

+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");
+return -1;
+}
+
+return 0;
+}
+
static const VMStateDescription vuf_vmstate = {
.name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
};

Will it be possible to extend this vmstate when virtiofsd adds support
for stateful migration without breaking migration compatibility?

If not, then I think a marker field should be added to the vmstate:
0 - stateless/reconnect migration (the approach you're adding in this patch)
1 - stateful migration (future virtiofsd feature)

When the field is 0 there are no further vmstate fields and we trust
that the destination vhost-user-fs server already has the necessary
state.

When the field is 1 there are additional vmstate fields that contain
the virtiofsd state.

The goal is for QEMU to support 3 migration modes, depending on the
vhost-user-fs server:
1. No migration support.
2. Stateless migration.
3. Stateful migration.

Sure. These vmstate fields are very generic and mandatory for any
virtio device. If in future more state can be transfer in migration
stream the vmstate can be extended with additional fields. This can
be done with new subsections and/or bumping version_id.

My concern is that the vmstate introduced in this patch may be
unusable when stateful migration is added. So additional compatibility
code will need to be introduced to make your stateless migration
continue working with extended vmstate.

By adding a marker field in this patch it should be possible to
continue using the same vmstate for stateless migration without adding
extra compatibility code in the future.

I understand, but this fields in vmstate just packs generic virtio
device state that is accessible by qemu. All additional data could be
added later by extra fields. I believe we couldn't pull off any type
of virtio device migration without transferring virtqueues so more
sophisticated types of migration would require adding more data and
not modification to this part of vmstate.

What I'm saying is that your patch could define the vmstate such that
it that contains a field to differentiate between stateless and
stateful migration. That way QEMU versions that only support stateless
migration (this patch) will be able to migrate to future QEMU versions
that support both stateless and stateful without compatibility issues.

I double-checked migration documentation for subsections at
https://www.qemu.org/docs/master/devel/migration.html#subsections
and believe it perfectly describes our case: virtio device state

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-19 Thread Anton Kuchin

On 19/01/2023 16:30, Stefan Hajnoczi wrote:

On Thu, 19 Jan 2023 at 07:43, Anton Kuchin  wrote:

On 18/01/2023 17:52, Stefan Hajnoczi wrote:

On Sun, 15 Jan 2023 at 12:21, Anton Kuchin  wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
   hw/virtio/vhost-user-fs.c | 25 -
   qapi/migration.json   |  7 ++-
   2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
   #include "hw/virtio/vhost-user-fs.h"
   #include "monitor/monitor.h"
   #include "sysemu/sysemu.h"
+#include "migration/migration.h"

   static const int user_feature_bits[] = {
   VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
   return &fs->vhost_dev;
   }

+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");
+return -1;
+}
+
+return 0;
+}
+
   static const VMStateDescription vuf_vmstate = {
   .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
   };

Will it be possible to extend this vmstate when virtiofsd adds support
for stateful migration without breaking migration compatibility?

If not, then I think a marker field should be added to the vmstate:
0 - stateless/reconnect migration (the approach you're adding in this patch)
1 - stateful migration (future virtiofsd feature)

When the field is 0 there are no further vmstate fields and we trust
that the destination vhost-user-fs server already has the necessary
state.

When the field is 1 there are additional vmstate fields that contain
the virtiofsd state.

The goal is for QEMU to support 3 migration modes, depending on the
vhost-user-fs server:
1. No migration support.
2. Stateless migration.
3. Stateful migration.

Sure. These vmstate fields are very generic and mandatory for any
virtio device. If in future more state can be transfer in migration
stream the vmstate can be extended with additional fields. This can
be done with new subsections and/or bumping version_id.

My concern is that the vmstate introduced in this patch may be
unusable when stateful migration is added. So additional compatibility
code will need to be introduced to make your stateless migration
continue working with extended vmstate.

By adding a marker field in this patch it should be possible to
continue using the same vmstate for stateless migration without adding
extra compatibility code in the future.

I understand, but this fields in vmstate just packs generic virtio
device state that is accessible by qemu. All additional data could be
added later by extra fields. I believe we couldn't pull off any type
of virtio device migration without transferring virtqueues so more
sophisticated types of migration would require adding more data and
not modification to this part of vmstate.




The main purpose of this patch is to allow update VM to newer version
of qemu via local migration without disruption to guest. And future
versions hopefully could pack more state from external environment
to migration stream.



   static Property vuf_properties[] = {
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
   #will be handled faster.  This is a performance feature 
and
   #should not affect the correctness of postcopy migration.
   #(since 7.1)
+# @vhost-u

Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-19 Thread Anton Kuchin

On 19/01/2023 14:51, Michael S. Tsirkin wrote:

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
  hw/virtio/vhost-user-fs.c | 25 -
  qapi/migration.json   |  7 ++-
  2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
  #include "hw/virtio/vhost-user-fs.h"
  #include "monitor/monitor.h"
  #include "sysemu/sysemu.h"
+#include "migration/migration.h"
  
  static const int user_feature_bits[] = {

  VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
  return &fs->vhost_dev;
  }
  
+static int vhost_user_fs_pre_save(void *opaque)

+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");

Isn't it possible that some backends are same and some are not?
Shouldn't this be a device property then?

If some are not the same it is not guaranteed that correct FUSE
state is present there, so orchestrator shouldn't set the capability
because this can result in destination devices being broken (they'll
be fine after the remount in guest, but this is guest visible and is
not acceptable).

I can imagine smart orchestrator and backend that can transfer
internal FUSE state, but we are not there yet, and this would be
their responsibility then to ensure endpoint compatibility between src
and dst and set the capability (that's why I put "e.g." and "or" in
the error description).






+return -1;
+}
+
+return 0;
+}
+
  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
  };
  
  static Property vuf_properties[] = {

diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
  #will be handled faster.  This is a performance feature 
and
  #should not affect the correctness of postcopy migration.
  #(since 7.1)
+# @vhost-user-fs: If enabled, the migration process will allow migration of
+# vhost-user-fs devices, this should be enabled only when
+# backend can preserve local FUSE state e.g. for qemu update
+# when dst reconects to the same endpoints after migration.
+# (since 8.0)
  #
  # Features:
  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -492,7 +497,7 @@
 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
 { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
 'validate-uuid', 'background-snapshot',
-   'zero-copy-send', 'postcopy-preempt'] }
+   'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }

I kind of dislike that it's such a specific flag. Is only vhost-user-fs
ever going to be affected? Any way to put it in a way that is more generic?

Here I agree with you: I would prefer less narrow naming too. But I
didn't manage to come up with one. Looks like many other vhost-user
devices could benefit from this so maybe "vhost-user-stateless" or
something like this would be better.
I'm not sure that other types of devices could handle reconnect to
the old endpoint as easy as vhost-user-fs, but anyway the support for
this flag needs to be implemented for each device individually.
What do you think? Any ideas would be appreciated.





  ##
  # @MigrationCapabilityStatus:
--
2.34.1




Re: [PATCH] vhost-user-fs: add capability to allow migration

2023-01-19 Thread Anton Kuchin

On 18/01/2023 17:52, Stefan Hajnoczi wrote:

On Sun, 15 Jan 2023 at 12:21, Anton Kuchin  wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
  hw/virtio/vhost-user-fs.c | 25 -
  qapi/migration.json   |  7 ++-
  2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
  #include "hw/virtio/vhost-user-fs.h"
  #include "monitor/monitor.h"
  #include "sysemu/sysemu.h"
+#include "migration/migration.h"

  static const int user_feature_bits[] = {
  VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
  return &fs->vhost_dev;
  }

+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal FUSE 
"
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 'vhost-user-fs' 
"
+ "migration capability to true to enable migration.");
+return -1;
+}
+
+return 0;
+}
+
  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
  };

Will it be possible to extend this vmstate when virtiofsd adds support
for stateful migration without breaking migration compatibility?

If not, then I think a marker field should be added to the vmstate:
0 - stateless/reconnect migration (the approach you're adding in this patch)
1 - stateful migration (future virtiofsd feature)

When the field is 0 there are no further vmstate fields and we trust
that the destination vhost-user-fs server already has the necessary
state.

When the field is 1 there are additional vmstate fields that contain
the virtiofsd state.

The goal is for QEMU to support 3 migration modes, depending on the
vhost-user-fs server:
1. No migration support.
2. Stateless migration.
3. Stateful migration.


Sure. These vmstate fields are very generic and mandatory for any
virtio device. If in future more state can be transfer in migration
stream the vmstate can be extended with additional fields. This can
be done with new subsections and/or bumping version_id.

The main purpose of this patch is to allow update VM to newer version
of qemu via local migration without disruption to guest. And future
versions hopefully could pack more state from external environment
to migration stream.





  static Property vuf_properties[] = {
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
  #will be handled faster.  This is a performance feature 
and
  #should not affect the correctness of postcopy migration.
  #(since 7.1)
+# @vhost-user-fs: If enabled, the migration process will allow migration of
+# vhost-user-fs devices, this should be enabled only when
+# backend can preserve local FUSE state e.g. for qemu update
+# when dst reconects to the same endpoints after migration.
+# (since 8.0)

This is global but a guest can have multiple vhost-user-fs devices
connected to different servers.

AFAIK vhost-user requires unix socket and memory shared from guest so
devices can't be connected to different servers, just to different
endpoints on current host.



I would add a qdev property to the device instead of introducing a
migration capability. The property would enable "stateless migration".
When the property is not set, migration would be prohibited.

I did thought about that, but this is really not a property of device,
this is the capability of management software an

[PATCH] vhost-user-fs: add capability to allow migration

2023-01-15 Thread Anton Kuchin
Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin 
---
 hw/virtio/vhost-user-fs.c | 25 -
 qapi/migration.json   |  7 ++-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
 #include "hw/virtio/vhost-user-fs.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
+#include "migration/migration.h"
 
 static const int user_feature_bits[] = {
 VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
 return &fs->vhost_dev;
 }
 
+static int vhost_user_fs_pre_save(void *opaque)
+{
+MigrationState *s = migrate_get_current();
+
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+error_report("Migration of vhost-user-fs devices requires internal 
FUSE "
+ "state of backend to be preserved. If orchestrator can "
+ "guarantee this (e.g. dst connects to the same backend "
+ "instance or backend state is migrated) set 
'vhost-user-fs' "
+ "migration capability to true to enable migration.");
+return -1;
+}
+
+return 0;
+}
+
 static const VMStateDescription vuf_vmstate = {
 .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_END_OF_LIST()
+},
+   .pre_save = vhost_user_fs_pre_save,
 };
 
 static Property vuf_properties[] = {
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
 #will be handled faster.  This is a performance feature and
 #should not affect the correctness of postcopy migration.
 #(since 7.1)
+# @vhost-user-fs: If enabled, the migration process will allow migration of
+# vhost-user-fs devices, this should be enabled only when
+# backend can preserve local FUSE state e.g. for qemu update
+# when dst reconects to the same endpoints after migration.
+# (since 8.0)
 #
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -492,7 +497,7 @@
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
-   'zero-copy-send', 'postcopy-preempt'] }
+   'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.34.1




Re: [Virtio-fs] [PATCH] vhost-user-fs: fix features handling

2021-04-10 Thread Anton Kuchin



On 09/04/2021 18:56, Vivek Goyal wrote:

On Thu, Apr 08, 2021 at 10:55:34PM +0300, Anton Kuchin wrote:

Make virtio-fs take into account server capabilities.

Just returning requested features assumes they all of then are implemented
by server and results in setting unsupported configuration if some of them
are absent.

Signed-off-by: Anton Kuchin

[CC stefan and qemu-devel.]

Can you give more details of what problem exactly you are facing. Or
this fix is about avoiding a future problem where device can refuse
to support a feature qemu is requesting for.


This fixes existing problem that qemu ignores features (un)supported by 
backend and this works fine only if backend features match features of 
qemu. Otherwise qemu incorrectly assumes that backend suports all of 
them and calls vhost_set_features() not taking into account result of 
previous vhost_get_features() call. This breaks protocol and can crash 
server or cause incorrect behavior.


This is why I hope it to be accepted in time for 6.0 release.


IIUC, this patch is preparing a list of features vhost-user-fs device
can support. Then it calls vhost_get_features() which makes sure that
all these features are support by real vhost-user device (hdev->features).
If not, then corresponding feature is reset and remaining features
are returned to caller.
When this callback is executed in virtio_bus_device_plugged() list of 
features that vhost-backend supports has been already obtained earlier 
by vhost_user_get_features() in vuf_device_realize() and stored in 
hdev->features. vuf_get_features() should return bitmask of features 
that are common for vhost backend (hdev->features) and frontend 
(vdev->host_features). But instead it just returns host features.

This feature negotion bit is called in so many places that I am kind of
lost that who should be doing what. Will leave it to Stefan who
understands it much better.



---
  hw/virtio/vhost-user-fs.c | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index ac4fc34b36..6cf983ba0e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,14 @@
  #include "monitor/monitor.h"
  #include "sysemu/sysemu.h"
  
+static const int user_feature_bits[] = {

+VIRTIO_F_VERSION_1,
+VIRTIO_RING_F_INDIRECT_DESC,
+VIRTIO_RING_F_EVENT_IDX,
+VIRTIO_F_NOTIFY_ON_EMPTY,
+VHOST_INVALID_FEATURE_BIT
+};
+
  static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
  {
  VHostUserFS *fs = VHOST_USER_FS(vdev);
@@ -129,11 +137,12 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t 
status)
  }
  
  static uint64_t vuf_get_features(VirtIODevice *vdev,

-  uint64_t requested_features,
-  Error **errp)
+ uint64_t features,

Will it make sense to keep the name requested_features. This kind of
makes it clear that caller is requesting these features.

I feel there should be few lines of comments also to make it clear
what this function is actually doing.


This fix was inspired by similar functions in 
hw/scsi/vhost-scsi-common.c, hw/virtio/vhost-user-vsock.c, 
hw/block/vhost-user-blk.c and hw/net/vhost_net.c and I borrowed naming 
from them just to be consistent.


IMO this looks like a good place for refactoring because we have more 
and more vhost-user devices that duplicate that code, but it can be done 
after the bug is fixed.



Vivek


+ Error **errp)
  {
-/* No feature bits used yet */
-return requested_features;
+VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
  }
  
  static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)

--
2.25.1


___
Virtio-fs mailing list
virtio...@redhat.com
https://listman.redhat.com/mailman/listinfo/virtio-fs




[PATCH] vhost-user-fs: fix features handling

2021-04-08 Thread Anton Kuchin
Make virtio-fs take into account server capabilities.

Just returning requested features assumes they all of then are implemented
by server and results in setting unsupported configuration if some of them
are absent.

Signed-off-by: Anton Kuchin 
---
 hw/virtio/vhost-user-fs.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index ac4fc34b36..6cf983ba0e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,14 @@
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
 
+static const int user_feature_bits[] = {
+VIRTIO_F_VERSION_1,
+VIRTIO_RING_F_INDIRECT_DESC,
+VIRTIO_RING_F_EVENT_IDX,
+VIRTIO_F_NOTIFY_ON_EMPTY,
+VHOST_INVALID_FEATURE_BIT
+};
+
 static void vuf_get_config(VirtIODevice *vdev, uint8_t *config)
 {
 VHostUserFS *fs = VHOST_USER_FS(vdev);
@@ -129,11 +137,12 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t 
status)
 }
 
 static uint64_t vuf_get_features(VirtIODevice *vdev,
-  uint64_t requested_features,
-  Error **errp)
+ uint64_t features,
+ Error **errp)
 {
-/* No feature bits used yet */
-return requested_features;
+VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+return vhost_get_features(&fs->vhost_dev, user_feature_bits, features);
 }
 
 static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq)
-- 
2.25.1




vhost-user: questions regarding migration

2020-09-19 Thread Anton Kuchin
Hi,

I'm implementing migration support in vhost-user backend and have a
couple of questions:

1. How master can be sure that logging was started?

We expect that right after set_fatures command with VHOST_F_LOG_ALL flag
all memory modifications will be tracked in log, but slave can need a
little time to process this command so there is a chance that some
requests can be untracked. Is there a way to ensure all requests are
logged or determine the moment since when tracking starts and master can
start migrating memory?

2. Why do we need separate log_addr for vring and how can it be not
covered by mem table?

As far as I understand slave receives used address in set_vring_addr
command and to map it correctly we do need valid entry in memory table.
So this field looks redundant to me. Am I missing something?

BTW the word "log_guest_addr" is mentioned only once in the document and
in "vring address description" payload it is just called "log",
shouldn't we should change this names to match?



pEpkey.asc
Description: application/pgp-keys


[RFC PATCH] virtio: Change order of appling runstate to device and bus

2019-12-20 Thread Anton Kuchin
On transition to running first apply state to bus and then to device
so device can access bus functions correctly. When going to stopped
notify device first and then the bus.

Signed-off-by: Anton Kuchin 
---
 hw/virtio/virtio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 04716b5f6c..2ea2edba10 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3096,7 +3096,7 @@ static void virtio_vmstate_change(void *opaque, int 
running, RunState state)
 bool backend_run = running && virtio_device_started(vdev, vdev->status);
 vdev->vm_running = running;
 
-if (backend_run) {
+if (!backend_run) {
 virtio_set_status(vdev, vdev->status);
 }
 
@@ -3104,7 +3104,7 @@ static void virtio_vmstate_change(void *opaque, int 
running, RunState state)
 k->vmstate_change(qbus->parent, backend_run);
 }
 
-if (!backend_run) {
+if (backend_run) {
 virtio_set_status(vdev, vdev->status);
 }
 }
-- 
2.20.1




[Qemu-devel] [PATCH] virtio-net: remove redundant qdev from VirtIONet

2019-07-16 Thread Anton Kuchin
Signed-off-by: Anton Kuchin 
---
 hw/net/virtio-net.c| 3 +--
 include/hw/virtio/virtio-net.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b9e1cd71cf..16d2ad5927 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -387,7 +387,7 @@ static void rxfilter_notify(NetClientState *nc)
 VirtIONet *n = qemu_get_nic_opaque(nc);
 
 if (nc->rxfilter_notify_enabled) {
-gchar *path = object_get_canonical_path(OBJECT(n->qdev));
+gchar *path = object_get_canonical_path(OBJECT(n));
 qapi_event_send_nic_rx_filter_changed(!!n->netclient_name,
   n->netclient_name, path);
 g_free(path);
@@ -2759,7 +2759,6 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 nc->rxfilter_notify_enabled = 1;
 
 QTAILQ_INIT(&n->rsc_chains);
-n->qdev = dev;
 }
 
 static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index b96f0c643f..4a1b599d48 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -174,7 +174,6 @@ struct VirtIONet {
 uint32_t *vlans;
 virtio_net_conf net_conf;
 NICConf nic_conf;
-DeviceState *qdev;
 int multiqueue;
 uint16_t max_queues;
 uint16_t curr_queues;
-- 
2.20.1




[Qemu-devel] What events should be handled in iohandler context?

2019-07-15 Thread Anton Kuchin

Hi,

I'm trying to understand contexts and handlers/notifiers and a bit 
confused about two contexts living in main loop: qemu_aio_context and 
iohandler_ctx. It is mentioned in the iohandler_ctx comment that 
qemu_aio_context can't be reused because "iohandlers mustn't be polled 
by aio_poll(qemu_aio_context)" but there is no exlanation why.


I tried to find examples and failed to understand why virtio-net 
eventfds are registred to iohandler_ctx with generic virtio callback 
virtio_device_start_ioeventfd_impl() but TX bottom-half and handlers of 
back-end TAP use qemu_aio_context.


Can you explain a little bit why we need some fds to be polled and some 
not to be polled? And how can I choose which context is right for me?


Thanks in advance for your help!

Anton




[Qemu-devel] [PATCH] block: remove bs from lists before closing

2019-05-07 Thread Anton Kuchin
Close involves flush that can be performed asynchronously and bs
must be protected from being referenced before it is deleted.

Signed-off-by: Anton Kuchin 
---
 block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 9ae5c0ed2f..b505271a4d 100644
--- a/block.c
+++ b/block.c
@@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
 assert(bdrv_op_blocker_is_empty(bs));
 assert(!bs->refcnt);
 
-bdrv_close(bs);
-
 /* remove from list, if necessary */
 if (bs->node_name[0] != '\0') {
 QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
 }
 QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);
 
+bdrv_close(bs);
+
 g_free(bs);
 }
 
-- 
2.19.1




Re: [Qemu-devel] aio context ownership during bdrv_close()

2019-05-06 Thread Anton Kuchin

On 29/04/2019 13:38, Kevin Wolf wrote:

Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben:

I can't figure out ownership of aio context during bdrv_close().

As far as I understand bdrv_unref() shold be called with acquired aio
context to prevent concurrent operations (at least most usages in blockdev.c
explicitly acquire and release context, but not all).

I think the theory is like this:

1. bdrv_unref() can only be called from the main thread

2. A block node for which bdrv_close() is called has no references. If
there are no more parents that keep it in a non-default iothread,
they should be in the main AioContext. So no locks need to be taken
during bdrv_close().

In practice, 2. is not fully true today, even though block devices do
stop their dataplane and move the block nodes back to the main
AioContext on shutdown. I am currently working on some fixes related to
this, afterwards the situation should be better.


But if refcount reaches zero and bs is going to be deleted in bdrv_close()
we need to ensure that drain is finished data is flushed and there are no
more pending coroutines and bottomhalves, so drain and flush functions can
enter coroutine and perform yield in several places. As a result control
returns to coroutine caller that will release aio context and when
completion bh will continue cleanup process it will be executed without
ownership of context. Is this a valid situation?

Do you have an example where this happens?

Normally, leaving the coroutine means that the AioContext lock is
released, but it is later reentered from the same AioContext, so the
lock will be taken again.

I was wrong. Coroutines do acquire aio context when reentered.



Moreover if yield happens bs that is being deleted has zero refcount but is
still present in lists graph_bdrv_states and all_bdrv_states and can be
accidentally accessed. Shouldn't we remove it from these lists ASAP when
deletion process starts as we do from monitor_bdrv_states?

Hm, I think it should only disappear when the image file is actually
closed. But in practice, it probably makes little difference either way.


I'm still worried about a period of time since coroutine yields and 
until it is reentered, looks like aio context can be grabbed by other 
code and bs can be treated as valid.


I have no example of such situation too but I see there bdrv_aio_flush 
and bdrv_co_flush_to_disk callbacks which are called during flush and 
can take unpredicable time to complete (e.g. raw_aio_flush in file-win32 
uses thread pool inside to process request and RBD can also be affected 
but I didn't dig deep enough to be sure).


If main loop starts processing next qmp command before completion is 
called lists will be in inconsistent state and hard to debug 
use-after-free bugs and crashes can happen.


Fix seems trivial and shouldn't break any existing code.

---

diff --git a/block.c b/block.c
index 16615bc876..25c3b72520 100644
--- a/block.c
+++ b/block.c
@@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs)
 assert(bdrv_op_blocker_is_empty(bs));
 assert(!bs->refcnt);

-    bdrv_close(bs);
-
 /* remove from list, if necessary */
 if (bs->node_name[0] != '\0') {
 QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list);
 }
 QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list);

+    bdrv_close(bs);
+
 g_free(bs);
 }

--




Kevin




[Qemu-devel] aio context ownership during bdrv_close()

2019-04-26 Thread Anton Kuchin

I can't figure out ownership of aio context during bdrv_close().

As far as I understand bdrv_unref() shold be called with acquired aio 
context to prevent concurrent operations (at least most usages in 
blockdev.c explicitly acquire and release context, but not all).


But if refcount reaches zero and bs is going to be deleted in 
bdrv_close() we need to ensure that drain is finished data is flushed 
and there are no more pending coroutines and bottomhalves, so drain and 
flush functions can enter coroutine and perform yield in several places. 
As a result control returns to coroutine caller that will release aio 
context and when completion bh will continue cleanup process it will be 
executed without ownership of context. Is this a valid situation?


Moreover if yield happens bs that is being deleted has zero refcount but 
is still present in lists graph_bdrv_states and all_bdrv_states and can 
be accidentally accessed. Shouldn't we remove it from these lists ASAP 
when deletion process starts as we do from monitor_bdrv_states?






[Qemu-devel] Is IOThread for virtio-net a good idea?

2019-02-11 Thread Anton Kuchin
As far as I can see currently IOThread offloading is used only for block 
devices and all others are emulated by main thread.


I expect that network devices can also benefit from processing in 
separate thread but I couldn't find any recent work in this direction. 
I'm going to implement a PoC but I want to ask if you know any previous 
attempts and do you know why it can be a total waste of time. Are there 
fundamental obstacles that prevent network emulation handling in IOThread?





[Qemu-devel] [PATCH] block: split block/qapi.c to avoid linking utilities with qapi

2019-01-28 Thread Anton Kuchin
Only part of block/qapi.c is used by qemu-io qemu-nbd and qemu-img
and its not realy about QAPI, so move it to separate file to reduce
amount of unused code linked to utilities and avoid unnecessary
dependencies.

Signed-off-by: Anton Kuchin 
---
 block.c   |   2 +-
 block/Makefile.objs   |   3 +-
 block/bdrv_info.c | 582 ++
 block/qapi.c  | 553 +---
 hmp.c |   2 +-
 include/block/{qapi.h => bdrv_info.h} |   6 +-
 monitor.c |   2 +-
 qemu-img.c|   2 +-
 qemu-io-cmds.c|   2 +-
 9 files changed, 593 insertions(+), 561 deletions(-)
 create mode 100644 block/bdrv_info.c
 rename include/block/{qapi.h => bdrv_info.h} (97%)

diff --git a/block.c b/block.c
index 4f5ff2cc12..a14d64c2fd 100644
--- a/block.c
+++ b/block.c
@@ -43,7 +43,7 @@
 #include "qemu/notify.h"
 #include "qemu/option.h"
 #include "qemu/coroutine.h"
-#include "block/qapi.h"
+#include "block/bdrv_info.h"
 #include "qemu/timer.h"
 #include "qemu/cutils.h"
 #include "qemu/id.h"
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 7a81892a52..aab92d59ec 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -14,7 +14,7 @@ block-obj-y += quorum.o
 block-obj-y += blkdebug.o blkverify.o blkreplay.o
 block-obj-$(CONFIG_PARALLELS) += parallels.o
 block-obj-y += blklogwrites.o
-block-obj-y += block-backend.o snapshot.o qapi.o
+block-obj-y += block-backend.o snapshot.o bdrv_info.o
 block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += file-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
@@ -41,6 +41,7 @@ block-obj-y += throttle.o copy-on-read.o
 block-obj-y += crypto.o
 
 common-obj-y += stream.o
+common-obj-y += qapi.o
 
 nfs.o-libs := $(LIBNFS_LIBS)
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
diff --git a/block/bdrv_info.c b/block/bdrv_info.c
new file mode 100644
index 00..425045421b
--- /dev/null
+++ b/block/bdrv_info.c
@@ -0,0 +1,582 @@
+/*
+ * Block layer qmp and info dump related functions
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/bdrv_info.h"
+#include "block/block_int.h"
+#include "block/write-threshold.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-block-core.h"
+#include "qapi/qobject-output-visitor.h"
+#include "qapi/qapi-visit-block-core.h"
+#include "qapi/qmp/qbool.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qlist.h"
+#include "qapi/qmp/qnum.h"
+#include "qapi/qmp/qstring.h"
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+
+BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
+BlockDriverState *bs, Error **errp)
+{
+ImageInfo **p_image_info;
+BlockDriverState *bs0;
+BlockDeviceInfo *info;
+
+if (!bs->drv) {
+error_setg(errp, "Block device %s is ejected", bs->node_name);
+return NULL;
+}
+
+info = g_malloc0(sizeof(*info));
+info->file   = g_strdup(bs->filename);
+info->ro = bs->read_only;
+info->drv= g_strdup(bs->drv->format_name);
+info->encrypted  = bs->encrypted;
+info->encryption_key_missing = false;
+
+info->cache = g_new(BlockdevCacheInfo, 1);
+*info->cache = (BlockdevCacheInfo) {
+.writeback  = blk ? blk_enable_write_cache(blk) : true,
+.direct = !!(bs->open_f

Re: [Qemu-devel] [PATCH] qmp: Deprecate query-nodes option of query-blockstats

2019-01-28 Thread Anton Kuchin



On 28/01/2019 18:37, Kevin Wolf wrote:

Am 28.01.2019 um 16:15 hat Anton Kuchin geschrieben:

This option is broken since a6baa60807 in v2.9 and returns mostly
zeroes instead of real stats because actual querring of BlockStats
that resides in blk is missing.

And it makes no sense because with this option BlockDriverState-s
are iterated but BlockAcctStats belong to BlockBackend and not BDS
since 7f0e9da6f13 in v2.5

Signed-off-by: Anton Kuchin 

Isn't query-nodes the only way to get wr_highest_offset for the protocol
layer? oVirt depends on this, as far as I know.


Isn't it reported without query-nodes parameter as "wr_highest_offset" 
of "parent" stats entry?


What we get with query-nodes is essentially the same output, but without 
correct data from backend and with one more copy of almost empty data 
from protocol layer.


I'll try to find usages of this option in oVirt code.



Kevin





Re: [Qemu-devel] [PATCH 0/2] block: add blk_lookup() for getting device by node_name

2019-01-28 Thread Anton Kuchin

On 28/01/2019 17:47, Kevin Wolf wrote:

Am 28.01.2019 um 15:27 hat Anton Kuchin geschrieben:

Some HMP and QMP commands are targeting BlockBackend but
for hotplugged devices name of BB is deprecated, instead
name of root BlockDriverState is set. These patches add
functions to search BB by attached root BDS name.

This approach isn't perfect, but I couldn't invent a better
one and I belive it's more convinient than accessing BB
by QOM path.

There could be more than one BlockBackend attached to a single node, so
this approach is simply wrong.


I was thinking about this but couldn't imagine configuration when it's 
having more than one root. Can you give an example, please, so I 
understand this case better.



I think the qdev ID is actually not only a pretty convenient way, but in
fact the only logical way to address a guest device (and BlockBackends
that can be accessed by the monitor are essentially a part of the guest
device).


As far as I remember backends currently have emply qdev ID so the only 
way to address them now is QOM path like 
".../my_hotplug_drive/virtio-backend". So I have to remember the name of 
the root driver it is connected to and also add this "/virtio-backend" 
suffix.


Can you explain a bit what are "monitor referenced" backends and which 
one can be accessed from monitor and which can not.



Does your series allow to perform any operation that isn't possible
currently? If so, it's probably this operation that needs to be fixed to
either accept node names (if it's a backend operation) or a device ID
(if it's a frontend operation).


Yes. It fixes latency histogram setting, nbd server binding to remove 
event and a couple of hmp comands. I also suspect that this can affect 
migration but I could't figure out what name is used for identifying drives.


Some of this code is also linked to utilities (qemu-img, qemu-io-cmds 
...) which require linking all QAPI stuff to them, that seems to be an 
overkill. I'll prepare the patch to eliminate this dependency and allow 
using QOM here if we decide that this is the best way.



Kevin





[Qemu-devel] [PATCH] qmp: Deprecate query-nodes option of query-blockstats

2019-01-28 Thread Anton Kuchin
This option is broken since a6baa60807 in v2.9 and returns mostly
zeroes instead of real stats because actual querring of BlockStats
that resides in blk is missing.

And it makes no sense because with this option BlockDriverState-s
are iterated but BlockAcctStats belong to BlockBackend and not BDS
since 7f0e9da6f13 in v2.5

Signed-off-by: Anton Kuchin 
---
 block/qapi.c | 26 +++---
 qapi/block-core.json |  8 +---
 qemu-deprecated.texi |  5 +
 3 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index c66f949db8..19d4b4ee42 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -502,8 +502,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, 
BlockBackend *blk)
  &ds->x_flush_latency_histogram);
 }
 
-static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
-bool blk_level)
+static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs)
 {
 BlockStats *s = NULL;
 
@@ -517,7 +516,7 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState 
*bs,
 /* Skip automatically inserted nodes that the user isn't aware of in
  * a BlockBackend-level command. Stay at the exact node for a node-level
  * command. */
-while (blk_level && bs->drv && bs->implicit) {
+while (bs->drv && bs->implicit) {
 bs = backing_bs(bs);
 assert(bs);
 }
@@ -531,12 +530,12 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState 
*bs,
 
 if (bs->file) {
 s->has_parent = true;
-s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
+s->parent = bdrv_query_bds_stats(bs->file->bs);
 }
 
-if (blk_level && bs->backing) {
+if (bs->backing) {
 s->has_backing = true;
-s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
+s->backing = bdrv_query_bds_stats(bs->backing->bs);
 }
 
 return s;
@@ -577,21 +576,10 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 {
 BlockStatsList *head = NULL, **p_next = &head;
 BlockBackend *blk;
-BlockDriverState *bs;
 
 /* Just to be safe if query_nodes is not always initialized */
 if (has_query_nodes && query_nodes) {
-for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) {
-BlockStatsList *info = g_malloc0(sizeof(*info));
-AioContext *ctx = bdrv_get_aio_context(bs);
-
-aio_context_acquire(ctx);
-info->value = bdrv_query_bds_stats(bs, false);
-aio_context_release(ctx);
-
-*p_next = info;
-p_next = &info->next;
-}
+error_setg(errp, "Option query_nodes is deprecated");
 } else {
 for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) {
 BlockStatsList *info;
@@ -604,7 +592,7 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 }
 
 aio_context_acquire(ctx);
-s = bdrv_query_bds_stats(blk_bs(blk), true);
+s = bdrv_query_bds_stats(blk_bs(blk));
 s->has_device = true;
 s->device = g_strdup(blk_name(blk));
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 91685be6c2..2dd5f6032c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -892,13 +892,7 @@
 #
 # Query the @BlockStats for all virtual block devices.
 #
-# @query-nodes: If true, the command will query all the block nodes
-#   that have a node name, in a list which will include "parent"
-#   information, but not "backing".
-#   If false or omitted, the behavior is as before - query all the
-#   device backends, recursively including their "parent" and
-#   "backing". Filter nodes that were created implicitly are
-#   skipped over in this mode. (Since 2.3)
+# @query-nodes: deprecated since 3.2
 #
 # Returns: A list of @BlockStats for each virtual block devices.
 #
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 219206a836..e1e04ced7d 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -112,6 +112,11 @@ Use ``device_add'' for hotplugging vCPUs instead of 
``cpu-add''.  See
 documentation of ``query-hotpluggable-cpus'' for additional
 details.
 
+@subsection query-blockstats (since 3.2)
+
+"query-nodes" parameter is not supported anymore because blockstats
+are not a prorerty of node.
+
 @section Human Monitor Protocol (HMP) commands
 
 @subsection The hub_id parameter of 'hostfwd_add' / 'hostfwd_remove' (since 
3.1)
-- 
2.19.1




[Qemu-devel] [PATCH 1/2] block: add functions to search BlockBackend by root BDS name

2019-01-28 Thread Anton Kuchin
BlockBackend name is empty if it is added with '-blockdev' and '-device'
options or hotplugged with QMP but callers still expect backend to be
accesible by name for operations like commit or statistics access.
Intoduce blk_lookup function to search both by name and BDS-root node_name.

Signed-off-by: Anton Kuchin 
---
 block/block-backend.c  | 29 +
 include/sysemu/block-backend.h |  7 +++
 2 files changed, 36 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 60d37a0c3d..86a492853c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -684,6 +684,35 @@ static BlockBackend *bdrv_first_blk(BlockDriverState *bs)
 return NULL;
 }
 
+/*
+ * Return the BlockBackend that has attached BDS-tree root with
+ * node_name @node_name if it exists, else null.
+ * @node_name must not be null.
+ */
+static BlockBackend *blk_by_root_name(const char *node_name)
+{
+BlockBackend *blk = NULL;
+
+assert(node_name);
+while ((blk = blk_all_next(blk)) != NULL) {
+BlockDriverState *bs = blk_bs(blk);
+if (bs && !strcmp(node_name, bs->node_name)) {
+return blk;
+}
+}
+return NULL;
+}
+
+BlockBackend *blk_lookup(const char *name)
+{
+assert(name);
+BlockBackend *blk = blk_by_name(name);
+if (!blk) {
+blk = blk_by_root_name(name);
+}
+return blk;
+}
+
 /*
  * Returns true if @bs has an associated BlockBackend.
  */
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index c96bcdee14..290b8f8fc9 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -85,6 +85,13 @@ void blk_unref(BlockBackend *blk);
 void blk_remove_all_bs(void);
 const char *blk_name(const BlockBackend *blk);
 BlockBackend *blk_by_name(const char *name);
+
+/*
+ * Search BlockBackend by name or root BlockDriverSate node_name.
+ * Hotplug BlockBackends have no name so need to also check BDS-tree roots
+ * @name must not be null.
+ */
+BlockBackend *blk_lookup(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
 BlockBackend *blk_all_next(BlockBackend *blk);
 bool monitor_add_blk(BlockBackend *blk, const char *name, Error **errp);
-- 
2.19.1




[Qemu-devel] [PATCH 0/2] block: add blk_lookup() for getting device by node_name

2019-01-28 Thread Anton Kuchin
Some HMP and QMP commands are targeting BlockBackend but
for hotplugged devices name of BB is deprecated, instead
name of root BlockDriverState is set. These patches add
functions to search BB by attached root BDS name.

This approach isn't perfect, but I couldn't invent a better
one and I belive it's more convinient than accessing BB
by QOM path.

Anton Kuchin (2):
  block: add functions to search BlockBackend by root BDS name
  block: migrate callers from blk_by_name to blk_lookup

 block/block-backend.c  | 29 +
 blockdev-nbd.c |  2 +-
 blockdev.c |  6 +++---
 hmp.c  |  2 +-
 include/sysemu/block-backend.h |  7 +++
 5 files changed, 41 insertions(+), 5 deletions(-)

-- 
2.19.1




[Qemu-devel] [PATCH 2/2] block: migrate callers from blk_by_name to blk_lookup

2019-01-28 Thread Anton Kuchin
Signed-off-by: Anton Kuchin 
---
 blockdev-nbd.c | 2 +-
 blockdev.c | 6 +++---
 hmp.c  | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index d73ac1b026..f2ea7318cf 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -162,7 +162,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, 
const char *name,
 return;
 }
 
-on_eject_blk = blk_by_name(device);
+on_eject_blk = blk_lookup(device);
 
 bs = bdrv_lookup_bs(device, device, errp);
 if (!bs) {
diff --git a/blockdev.c b/blockdev.c
index a8fa8748a9..bb01c41038 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1082,7 +1082,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 BlockDriverState *bs;
 AioContext *aio_context;
 
-blk = blk_by_name(device);
+blk = blk_lookup(device);
 if (!blk) {
 monitor_printf(mon, "Device '%s' not found\n", device);
 return;
@@ -3066,7 +3066,7 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
 return;
 }
 
-blk = blk_by_name(id);
+blk = blk_lookup(id);
 if (!blk) {
 error_report("Device '%s' not found", id);
 return;
@@ -4431,7 +4431,7 @@ void qmp_x_block_latency_histogram_set(
 bool has_boundaries_flush, uint64List *boundaries_flush,
 Error **errp)
 {
-BlockBackend *blk = blk_by_name(device);
+BlockBackend *blk = blk_lookup(device);
 BlockAcctStats *stats;
 int ret;
 
diff --git a/hmp.c b/hmp.c
index b2a2b1f84e..7b03d5c1d7 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2460,7 +2460,7 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict)
 Error *err = NULL;
 int ret;
 
-blk = blk_by_name(device);
+blk = blk_lookup(device);
 if (!blk) {
 BlockDriverState *bs = bdrv_lookup_bs(NULL, device, &err);
 if (bs) {
-- 
2.19.1




Re: [Qemu-devel] [RFC] block: Is name of BlockBackend deprecated with -blockdev parameter?

2018-12-14 Thread Anton Kuchin
I want to make it consistent but I'm note sure I understand all aspects 
of current state and legacy clients we need to support.


The initial idea was to have single function blk_lookup(char* name) and 
search in all namespaces internally as they are guaranteed to have no 
intersection. This will work a little slower but I see no hot paths that 
will be affected. And this way transition will be transparent to users 
but I'm not sure this can be done in backward compatible way.

Do you have any suggestions how to do it correctly?

On 11/12/2018 12:47, Kevin Wolf wrote:

[...]

Anton Kuchin  writes:
[...]

As far as I understand all named backends are stored in
monitor_block_backends list, but I can't get what is the point of
having this list, and why parse_drive() function doesn't call
monitor_add_blk() like blockdev_init() does with -drive option. Can
someone give me a hint on this?


And what are monitor_block_backends used for? What does monitor 
reference mean in this context and how backends in this list differ from 
all others?



I also noticed that some commands fallback to search by qdev_id or
BDS->node_name,  but at the creation time (both in
bdrv_assing_node_name and monitor_add_blk) it is already checked that
names are unique across these namespaces so may be it would be useful
to introduce generic search function?

Yes, BlockBackend names are not supposed to be used in the external
interfaces any more. If the QMP command is about the backend, it will
take a node name, and if it's about the guest device (which may not even
have a node attached with removable media), it will take a qdev ID.

The implementation of qmp_x_block_latency_histogram_set() is incorrect,
it shouldn't use blk_by_name(). I wonder where it got that pattern from
because it was added long after all other QMP commands had switched to
qmp_get_blk() or bdrv_lookup_bs().


Do we still need blk_by_name() to be public? May be we can replace it by 
new function like blk_lookup() and hide blk_by_name() for internal use 
only or fix its behavior to search by qdev_id first and if it fails 
fallback to old way of searching by device_id?




hmp_commit() should indeed use bdrv_lookup_bs() to also accept node
names.
But it also checks blk state before commit, so I'm not quite sure it can 
work correctly without blk.

I think we reviewed QMP to make sure we converted everything
and forgot about HMP commands that aren't mapped to QMP.

Kevin


Anton



[Qemu-devel] [RFC] block: Is name of BlockBackend deprecated with -blockdev parameter?

2018-12-10 Thread Anton Kuchin

Hello,

I'm trying to switch from -drive parameter to -blockdev + -device and 
having problems. Looks like with this option I have no way to set the 
name of  created BlockBackend, but some QMP and HMP commands are trying 
to find blk with blk_by_name() and fail to locate my device (e.g. 
hmp_commit, qmp_x_bloc_latency_histogram_set ...). Was it intentional 
and BB names are going to be deprecated?


This also seems to be a the case for all block devices hotplugged with 
QMP as they use the same code path.


As far as I understand all named backends are stored in 
monitor_block_backends list, but I can't get what is the point of having 
this list, and why parse_drive() function doesn't call monitor_add_blk() 
like blockdev_init() does with -drive option. Can someone give me a hint 
on this?


I also noticed that some commands fallback to search by qdev_id or 
BDS->node_name,  but at the creation time (both in bdrv_assing_node_name 
and monitor_add_blk) it is already checked that names are unique across 
these namespaces so may be it would be useful to introduce generic 
search function?


Thanks,
Anton