Re: [PATCH 1/4] vhost: Re-enable vrings after setting features
On 13/04/2023 14:03, Stefan Hajnoczi wrote: On Thu, 13 Apr 2023 at 04:20, Hanna Czenczek wrote: On 12.04.23 22:51, Stefan Hajnoczi wrote: On Tue, Apr 11, 2023 at 05:05:12PM +0200, Hanna Czenczek wrote: If the back-end supports the VHOST_USER_F_PROTOCOL_FEATURES feature, setting the vhost features will set this feature, too. Doing so disables all vrings, which may not be intended. For example, enabling or disabling logging during migration requires setting those features (to set or unset VHOST_F_LOG_ALL), which will automatically disable all vrings. In either case, the VM is running (disabling logging is done after a failed or cancelled migration, and only once the VM is running again, see comment in memory_global_dirty_log_stop()), so the vrings should really be enabled. As a result, the back-end seems to hang. To fix this, we must remember whether the vrings are supposed to be enabled, and, if so, re-enable them after a SET_FEATURES call that set VHOST_USER_F_PROTOCOL_FEATURES. It seems less than ideal that there is a short period in which the VM is running but the vrings will be stopped (between SET_FEATURES and SET_VRING_ENABLE). To fix this, we would need to change the protocol, e.g. by introducing a new flag or vhost-user protocol feature to disable disabling vrings whenever VHOST_USER_F_PROTOCOL_FEATURES is set, or add new functions for setting/clearing singular feature bits (so that F_LOG_ALL can be set/cleared without touching F_PROTOCOL_FEATURES). Even with such a potential addition to the protocol, we still need this fix here, because we cannot expect that back-ends will implement this addition. Signed-off-by: Hanna Czenczek --- include/hw/virtio/vhost.h | 10 ++ hw/virtio/vhost.c | 13 + 2 files changed, 23 insertions(+) diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index a52f273347..2fe02ed5d4 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -90,6 +90,16 @@ struct vhost_dev { int vq_index_end; /* if non-zero, minimum required value for max_queues */ int num_queues; + +/* + * Whether the virtqueues are supposed to be enabled (via + * SET_VRING_ENABLE). Setting the features (e.g. for + * enabling/disabling logging) will disable all virtqueues if + * VHOST_USER_F_PROTOCOL_FEATURES is set, so then we need to + * re-enable them if this field is set. + */ +bool enable_vqs; + /** * vhost feature handling requires matching the feature set * offered by a backend which may be a subset of the total diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index a266396576..cbff589efa 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -50,6 +50,8 @@ static unsigned int used_memslots; static QLIST_HEAD(, vhost_dev) vhost_devices = QLIST_HEAD_INITIALIZER(vhost_devices); +static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable); + bool vhost_has_free_slot(void) { unsigned int slots_limit = ~0U; @@ -899,6 +901,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev, } } +if (dev->enable_vqs) { +/* + * Setting VHOST_USER_F_PROTOCOL_FEATURES would have disabled all + * virtqueues, even if that was not intended; re-enable them if + * necessary. + */ +vhost_dev_set_vring_enable(dev, true); +} + out: return r; } @@ -1896,6 +1907,8 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size, static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable) { +hdev->enable_vqs = enable; + if (!hdev->vhost_ops->vhost_set_vring_enable) { return 0; } The vhost-user spec doesn't say that VHOST_F_LOG_ALL needs to be toggled at runtime and I don't think VHOST_USER_SET_PROTOCOL_FEATURES is intended to be used like that. This issue shows why doing so is a bad idea. VHOST_F_LOG_ALL does not need to be toggled to control logging. Logging is controlled at runtime by the presence of the dirty log (VHOST_USER_SET_LOG_BASE) and the per-vring logging flag (VHOST_VRING_F_LOG). Technically, the spec doesn’t say that SET_LOG_BASE is required. It says: “To start/stop logging of data/used ring writes, the front-end may send messages VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and VHOST_USER_SET_VRING_ADDR with VHOST_VRING_F_LOG in ring’s flags set to 1/0, respectively.” (So the spec also very much does imply that toggling F_LOG_ALL at runtime is a valid way to enable/disable logging. If we were to no longer do that, we should clarify it there.) I missed that VHOST_VRING_F_LOG only controls logging used ring writes while writes to descriptors are always logged when VHOST_F_LOG_ALL is set. I agree that the spec does require VHOST_F_LOG_ALL to be toggled at runtime. What I suggested won't work. But is there a valid use-case for logging some dirty memory but not all? I
Re: [Virtio-fs] [RFC 2/2] vhost-user-fs: Implement stateful migration
On 21/03/2023 19:49, Hanna Czenczek wrote: On 20.03.23 13:39, Anton Kuchin wrote: On 20/03/2023 11:33, Hanna Czenczek wrote: On 17.03.23 19:37, Anton Kuchin wrote: On 17/03/2023 19:52, Hanna Czenczek wrote: On 17.03.23 18:19, Anton Kuchin wrote: On 13/03/2023 19:48, Hanna Czenczek wrote: A virtio-fs device's VM state consists of: - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE) - the back-end's (virtiofsd's) internal state We get/set the latter via the new vhost-user operations FS_SET_STATE_FD, FS_GET_STATE, and FS_SET_STATE. [...] static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", - .unmigratable = 1, + .version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_VIRTIO_DEVICE, + { + .name = "back-end", + .info = &(const VMStateInfo) { + .name = "virtio-fs back-end state", + .get = vuf_load_state, + .put = vuf_save_state, + }, + }, I've been working on stateless migration patch [1] and there was discussed that we need to keep some kind of blocker by default if orchestrators rely on unmigratable field in virtio-fs vmstate to block the migration. For this purpose I've implemented flag that selects "none" or "external" and is checked in pre_save, so it could be extended with "internal" option. We didn't come to conclusion if we also need to check incoming migration, the discussion has stopped for a while but I'm going back to it now. I would appreciate if you have time to take a look at the discussion and consider the idea proposed there to store internal state as a subsection of vmstate to make it as an option but not mandatory. [1] https://patchew.org/QEMU/20230217170038.1273710-1-antonkuc...@yandex-team.ru/ So far I’ve mostly considered these issues orthogonal. If your stateless migration goes in first, then state is optional and I’ll adjust this series. If stateful migration goes in first, then your series can simply make state optional by introducing the external option, no? Not really. State can be easily extended by subsections but not trimmed. Maybe this can be worked around by defining two types of vmstate and selecting the correct one at migration, but I'm not sure. I thought your patch included a switch on the vhost-user-fs device (on the qemu side) to tell qemu what migration data to expect. Can we not transfer a 0-length state for 'external', and assert this on the destination side? Looks like I really need to make the description of my patch and the documentation more clear :) My patch proposes switch on _source_ side to select which data to save in the stream mostly to protect old orchestrators that don't expect virtio-fs to be migratable (and for internal case it can be extended to select if qemu needs to request state from backend), Michael insists that we also need to check on destination but I disagree because I believe that we can figure this out from stream data without additional device flags. But maybe we could also consider making stateless migration a special case of stateful migration; if we had stateful migration, can’t we just implement stateless migration by telling virtiofsd that it should submit a special “I have no state” pseudo-state, i.e. by having a switch on virtiofsd instead? Sure. Backend can send empty state (as your patch treats 0 length as a valid response and not error) or dummy state that can be recognized as stateless. The only potential problem is that then we need support in backend for new commands even to return dummy state, and if backend can support both types then we'll need some switch in backend to reply with real or empty state. Yes, exactly. Off the top of my head, some downsides of that approach would be (1) it’d need a switch on the virtiofsd side, not on the qemu side (not sure if that’s a downside, but a difference for sure), Why would you? It seems to me that this affects only how qemu treats the vmstate of device. If the state was requested backend sends it to qemu. If state subsection is present in stream qemu sends it to the backend for loading. Stateless one just doesn't request state from the backend. Or am I missing something? and (2) we’d need at least some support for this on the virtiofsd side, i.e. practically can’t come quicker than stateful migration support. Not much, essentially this is just a reconnect. I've sent a draft of a reconnect patch for old C-virtiofsd, for rust version it takes much longer because I'm learning rust and I'm not really good at it yet. I meant these two downsides not for your proposal, but instead if we implemented stateless migration only in the back-end; i.e. the front-end would only implement stateful migration, and the back-e
Re: [Virtio-fs] [RFC 2/2] vhost-user-fs: Implement stateful migration
On 20/03/2023 11:33, Hanna Czenczek wrote: On 17.03.23 19:37, Anton Kuchin wrote: On 17/03/2023 19:52, Hanna Czenczek wrote: On 17.03.23 18:19, Anton Kuchin wrote: On 13/03/2023 19:48, Hanna Czenczek wrote: A virtio-fs device's VM state consists of: - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE) - the back-end's (virtiofsd's) internal state We get/set the latter via the new vhost-user operations FS_SET_STATE_FD, FS_GET_STATE, and FS_SET_STATE. [...] static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", - .unmigratable = 1, + .version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_VIRTIO_DEVICE, + { + .name = "back-end", + .info = &(const VMStateInfo) { + .name = "virtio-fs back-end state", + .get = vuf_load_state, + .put = vuf_save_state, + }, + }, I've been working on stateless migration patch [1] and there was discussed that we need to keep some kind of blocker by default if orchestrators rely on unmigratable field in virtio-fs vmstate to block the migration. For this purpose I've implemented flag that selects "none" or "external" and is checked in pre_save, so it could be extended with "internal" option. We didn't come to conclusion if we also need to check incoming migration, the discussion has stopped for a while but I'm going back to it now. I would appreciate if you have time to take a look at the discussion and consider the idea proposed there to store internal state as a subsection of vmstate to make it as an option but not mandatory. [1] https://patchew.org/QEMU/20230217170038.1273710-1-antonkuc...@yandex-team.ru/ So far I’ve mostly considered these issues orthogonal. If your stateless migration goes in first, then state is optional and I’ll adjust this series. If stateful migration goes in first, then your series can simply make state optional by introducing the external option, no? Not really. State can be easily extended by subsections but not trimmed. Maybe this can be worked around by defining two types of vmstate and selecting the correct one at migration, but I'm not sure. I thought your patch included a switch on the vhost-user-fs device (on the qemu side) to tell qemu what migration data to expect. Can we not transfer a 0-length state for 'external', and assert this on the destination side? Looks like I really need to make the description of my patch and the documentation more clear :) My patch proposes switch on _source_ side to select which data to save in the stream mostly to protect old orchestrators that don't expect virtio-fs to be migratable (and for internal case it can be extended to select if qemu needs to request state from backend), Michael insists that we also need to check on destination but I disagree because I believe that we can figure this out from stream data without additional device flags. But maybe we could also consider making stateless migration a special case of stateful migration; if we had stateful migration, can’t we just implement stateless migration by telling virtiofsd that it should submit a special “I have no state” pseudo-state, i.e. by having a switch on virtiofsd instead? Sure. Backend can send empty state (as your patch treats 0 length as a valid response and not error) or dummy state that can be recognized as stateless. The only potential problem is that then we need support in backend for new commands even to return dummy state, and if backend can support both types then we'll need some switch in backend to reply with real or empty state. Yes, exactly. Off the top of my head, some downsides of that approach would be (1) it’d need a switch on the virtiofsd side, not on the qemu side (not sure if that’s a downside, but a difference for sure), Why would you? It seems to me that this affects only how qemu treats the vmstate of device. If the state was requested backend sends it to qemu. If state subsection is present in stream qemu sends it to the backend for loading. Stateless one just doesn't request state from the backend. Or am I missing something? and (2) we’d need at least some support for this on the virtiofsd side, i.e. practically can’t come quicker than stateful migration support. Not much, essentially this is just a reconnect. I've sent a draft of a reconnect patch for old C-virtiofsd, for rust version it takes much longer because I'm learning rust and I'm not really good at it yet. I meant these two downsides not for your proposal, but instead if we implemented stateless migration only in the back-end; i.e. the front-end would only implement stateful migration, and the back-end would send and accept an empty state. Then, qemu would always request a “state”
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 01/03/2023 17:33, Michael S. Tsirkin wrote: On Tue, Feb 28, 2023 at 07:59:54PM +0200, Anton Kuchin wrote: We can't rely here entirely on destination to block this because if VM is migrated to file and then can't be loaded by destination there is no way to fallback and resume the source so we need to have some kind of blocker on source by default. So I commented about a fallback. IMO it's just orchestrator being silly: don't kill source qemu until you have made sure destination loaded the file, or re-load it, and all will be well. I agree that it is always better to keep source alive until destination is loaded and ready to take-off. But in some cases resources are limited or strictly partitioned so we can't keep two VMs alive at the same time so the bast option is to check all we need on the source to make sure destination will run. Off the top of my head host-size VM would need entire additional host to migrate properly and lots of memory streaming that can cause huge downtime, but if memory is in shmem local migration to update qemu can take as much as just saving device state to file and running new qemu to load devices, take-over memory and continue running guest. But a bigger issue that this highlights is simply that if migrating to file you have no idea at all where will the file be loaded. Maybe some orchestrators know but I don't see how we can be sure all of them know. The only time where we know whether the file is loaded on the same host where it was saved is when we actually load it. Yes. Migration to file requires orchestrator to be well aware of what it is doing because file always contains only part of the state. This is hard but sometimes there are no other good options.
Re: [Virtio-fs] [RFC 2/2] vhost-user-fs: Implement stateful migration
On 17/03/2023 19:52, Hanna Czenczek wrote: On 17.03.23 18:19, Anton Kuchin wrote: On 13/03/2023 19:48, Hanna Czenczek wrote: A virtio-fs device's VM state consists of: - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE) - the back-end's (virtiofsd's) internal state We get/set the latter via the new vhost-user operations FS_SET_STATE_FD, FS_GET_STATE, and FS_SET_STATE. Signed-off-by: Hanna Czenczek --- hw/virtio/vhost-user-fs.c | 171 +- 1 file changed, 170 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index 83fc20e49e..df1fb02acc 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -20,8 +20,10 @@ #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" #include "qemu/error-report.h" +#include "qemu/memfd.h" #include "hw/virtio/vhost.h" #include "hw/virtio/vhost-user-fs.h" +#include "migration/qemu-file-types.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" @@ -298,9 +300,176 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +/** + * Fetch the internal state from the back-end (virtiofsd) and save it + * to `f`. + */ +static int vuf_save_state(QEMUFile *f, void *pv, size_t size, + const VMStateField *field, JSONWriter *vmdesc) +{ + VirtIODevice *vdev = pv; + VHostUserFS *fs = VHOST_USER_FS(vdev); + int memfd = -1; + /* Size of the shared memory through which to transfer the state */ + const size_t chunk_size = 4 * 1024 * 1024; + size_t state_offset; + ssize_t remaining; + void *shm_buf; + Error *local_err = NULL; + int ret, ret2; + + /* Set up shared memory through which to receive the state from virtiofsd */ + shm_buf = qemu_memfd_alloc("vhost-fs-state", chunk_size, + F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW, + &memfd, &local_err); + if (!shm_buf) { + error_report_err(local_err); + ret = -ENOMEM; + goto early_fail; + } + + /* Share the SHM area with virtiofsd */ + ret = vhost_fs_set_state_fd(&fs->vhost_dev, memfd, chunk_size); + if (ret < 0) { + goto early_fail; Don't we need some log message here too? Sure, why not. There are other places in this patch that just return -errno but print no error, I think they could all use a verbose error message. + } + + /* Receive the virtiofsd state in chunks, and write them to `f` */ + state_offset = 0; + do { + size_t this_chunk_size; + + remaining = vhost_fs_get_state(&fs->vhost_dev, state_offset, + chunk_size); + if (remaining < 0) { + ret = remaining; + goto fail; + } + + /* Prefix the whole state by its total length */ + if (state_offset == 0) { + qemu_put_be64(f, remaining); + } + + this_chunk_size = MIN(remaining, chunk_size); + qemu_put_buffer(f, shm_buf, this_chunk_size); + state_offset += this_chunk_size; + } while (remaining >= chunk_size); + + ret = 0; +fail: + /* Have virtiofsd close the shared memory */ + ret2 = vhost_fs_set_state_fd(&fs->vhost_dev, -1, 0); + if (ret2 < 0) { + error_report("Failed to remove state FD from the vhost-user-fs back " + "end: %s", strerror(-ret)); + if (ret == 0) { + ret = ret2; + } + } + +early_fail: + if (shm_buf) { + qemu_memfd_free(shm_buf, chunk_size, memfd); + } + + return ret; +} + +/** + * Load the back-end's (virtiofsd's) internal state from `f` and send + * it over to that back-end. + */ +static int vuf_load_state(QEMUFile *f, void *pv, size_t size, + const VMStateField *field) +{ + VirtIODevice *vdev = pv; + VHostUserFS *fs = VHOST_USER_FS(vdev); + int memfd = -1; + /* Size of the shared memory through which to transfer the state */ + const size_t chunk_size = 4 * 1024 * 1024; + size_t state_offset; + uint64_t remaining; + void *shm_buf; + Error *local_err = NULL; + int ret, ret2; + + /* The state is prefixed by its total length, read that first */ + remaining = qemu_get_be64(f); + + /* Set up shared memory through which to send the state to virtiofsd */ + shm_buf = qemu_memfd_alloc("vhost-fs-state", chunk_size, + F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW, + &memfd, &local_err); + if (!shm_buf) { + error_report_err(local_err); + ret = -ENOMEM; + goto early_fail; + } + + /* Share the SHM area with virtiofsd *
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 06/03/2023 23:53, Michael S. Tsirkin wrote: On Mon, Mar 06, 2023 at 10:55:29PM +0200, Anton Kuchin wrote: On 01/03/2023 22:22, Michael S. Tsirkin wrote: On Wed, Mar 01, 2023 at 09:35:56PM +0200, Anton Kuchin wrote: I do trust them :) I have to, otherwise we would need to pack all the properties and flags of qemu to the migration stream and validate them at destination or entire migration ends up broken beyond repair if orchestrators tend to do stupid things and need babysitting. This is not at all a crazy idea. It has some disadvantages too esp around cross version migration, which is why we don't do this yet. Indeed. I know VMMs that chose this path. But as long as qemu decided to trust orchestrators I think we'd better keep this consistent across the codebase. Only ivshmem_pre_load callback bothers to check device property and virtio_net_tx_waiting_pre_load checks that number of queue pairs doesn't exceed allowed maximum, all other *_pre_load functions generally just initialize some parts of state that need to be set before stream starts loading. We do validate things by necessity we just try to do as much as we can table-driver so it's not open-coded and less visible. We used to have lot more callbacks nowdays we try to keep it table driven. But e.g. pci streams RO state and validates that it's the same on destination. Sorry for late reply. I agree that checks should be done if we have data at hand. But in my opinion the situation here is a little different: pci is emulated by qemu and RO state affects how emulation will work on destination, the point of vhost-user is to outsource device emulation logic to daemons outside qemu and allow orchestrator manage both qemu and daemons. So engineering additional flags in qemu that don't affect device operation but only to check orchestrator correctness is excessive in my opinion.
Re: [RFC 1/2] vhost-user: Add interface for virtio-fs migration
On 13/03/2023 19:48, Hanna Czenczek wrote: Add a virtio-fs-specific vhost-user interface to facilitate migrating back-end-internal state. We plan to migrate the internal state simply as a binary blob after the streaming phase, so all we need is a way to transfer such a blob from and to the back-end. We do so by using a dedicated area of shared memory through which the blob is transferred in chunks. This patch adds the following vhost operations (and implements them for vhost-user): - FS_SET_STATE_FD: The front-end passes a dedicated shared memory area to the back-end. This area will be used to transfer state via the other two operations. (After the transfer FS_SET_STATE_FD detaches the shared memory area again.) - FS_GET_STATE: The front-end asks the back-end to place a chunk of internal state into the shared memory area. - FS_SET_STATE: The front-end puts a chunk of internal state into the shared memory area, and asks the back-end to fetch it. On the source side, the back-end is expected to serialize its internal state either when FS_SET_STATE_FD is invoked, or when FS_GET_STATE is invoked the first time. On subsequent FS_GET_STATE calls, it memcpy()s parts of that serialized state into the shared memory area. On the destination side, the back-end is expected to collect the state blob over all FS_SET_STATE calls, and then deserialize and apply it once FS_SET_STATE_FD detaches the shared memory area. Signed-off-by: Hanna Czenczek --- include/hw/virtio/vhost-backend.h | 9 ++ include/hw/virtio/vhost.h | 68 +++ hw/virtio/vhost-user.c| 138 ++ hw/virtio/vhost.c | 29 +++ 4 files changed, 244 insertions(+) diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h index ec3fbae58d..fa3bd19386 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -42,6 +42,12 @@ typedef int (*vhost_backend_init)(struct vhost_dev *dev, void *opaque, typedef int (*vhost_backend_cleanup)(struct vhost_dev *dev); typedef int (*vhost_backend_memslots_limit)(struct vhost_dev *dev); +typedef ssize_t (*vhost_fs_get_state_op)(struct vhost_dev *dev, + uint64_t state_offset, size_t size); +typedef int (*vhost_fs_set_state_op)(struct vhost_dev *dev, + uint64_t state_offset, size_t size); +typedef int (*vhost_fs_set_state_fd_op)(struct vhost_dev *dev, int memfd, +size_t size); typedef int (*vhost_net_set_backend_op)(struct vhost_dev *dev, struct vhost_vring_file *file); typedef int (*vhost_net_set_mtu_op)(struct vhost_dev *dev, uint16_t mtu); @@ -138,6 +144,9 @@ typedef struct VhostOps { vhost_backend_init vhost_backend_init; vhost_backend_cleanup vhost_backend_cleanup; vhost_backend_memslots_limit vhost_backend_memslots_limit; +vhost_fs_get_state_op vhost_fs_get_state; +vhost_fs_set_state_op vhost_fs_set_state; +vhost_fs_set_state_fd_op vhost_fs_set_state_fd; vhost_net_set_backend_op vhost_net_set_backend; vhost_net_set_mtu_op vhost_net_set_mtu; vhost_scsi_set_endpoint_op vhost_scsi_set_endpoint; diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index a52f273347..b1ad9785dd 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -336,4 +336,72 @@ int vhost_dev_set_inflight(struct vhost_dev *dev, struct vhost_inflight *inflight); int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size, struct vhost_inflight *inflight); + +/** + * vhost_fs_set_state_fd(): Share memory with a virtio-fs vhost + * back-end for transferring internal state for the purpose of + * migration. Calling this function again will have the back-end + * unregister (free) the previously shared memory area. + * + * @dev: The vhost device + * @memfd: File descriptor associated with the shared memory to share. + * If negative, no memory area is shared, only releasing the + * previously shared area, and announcing the end of transfer + * (which, on the destination side, should lead to the + * back-end deserializing and applying the received state). + * @size: Size of the shared memory area + * + * Returns 0 on success, and -errno on failure. + */ +int vhost_fs_set_state_fd(struct vhost_dev *dev, int memfd, size_t size); + +/** + * vhost_fs_get_state(): Request the virtio-fs vhost back-end to place + * a chunk of migration state into the shared memory area negotiated + * through vhost_fs_set_state_fd(). May only be used for migration, + * and only by the source side. + * + * The back-end-internal migration state is treated as a binary blob, + * which is transferred in chunks to fit into the shared memory area. + * + * @dev: The vhost device + * @state
Re: [RFC 2/2] vhost-user-fs: Implement stateful migration
On 13/03/2023 19:48, Hanna Czenczek wrote: A virtio-fs device's VM state consists of: - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE) - the back-end's (virtiofsd's) internal state We get/set the latter via the new vhost-user operations FS_SET_STATE_FD, FS_GET_STATE, and FS_SET_STATE. Signed-off-by: Hanna Czenczek --- hw/virtio/vhost-user-fs.c | 171 +- 1 file changed, 170 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index 83fc20e49e..df1fb02acc 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -20,8 +20,10 @@ #include "hw/virtio/virtio-bus.h" #include "hw/virtio/virtio-access.h" #include "qemu/error-report.h" +#include "qemu/memfd.h" #include "hw/virtio/vhost.h" #include "hw/virtio/vhost-user-fs.h" +#include "migration/qemu-file-types.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" @@ -298,9 +300,176 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +/** + * Fetch the internal state from the back-end (virtiofsd) and save it + * to `f`. + */ +static int vuf_save_state(QEMUFile *f, void *pv, size_t size, + const VMStateField *field, JSONWriter *vmdesc) +{ +VirtIODevice *vdev = pv; +VHostUserFS *fs = VHOST_USER_FS(vdev); +int memfd = -1; +/* Size of the shared memory through which to transfer the state */ +const size_t chunk_size = 4 * 1024 * 1024; +size_t state_offset; +ssize_t remaining; +void *shm_buf; +Error *local_err = NULL; +int ret, ret2; + +/* Set up shared memory through which to receive the state from virtiofsd */ +shm_buf = qemu_memfd_alloc("vhost-fs-state", chunk_size, + F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW, + &memfd, &local_err); +if (!shm_buf) { +error_report_err(local_err); +ret = -ENOMEM; +goto early_fail; +} + +/* Share the SHM area with virtiofsd */ +ret = vhost_fs_set_state_fd(&fs->vhost_dev, memfd, chunk_size); +if (ret < 0) { +goto early_fail; Don't we need some log message here too? +} + +/* Receive the virtiofsd state in chunks, and write them to `f` */ +state_offset = 0; +do { +size_t this_chunk_size; + +remaining = vhost_fs_get_state(&fs->vhost_dev, state_offset, + chunk_size); +if (remaining < 0) { +ret = remaining; +goto fail; +} + +/* Prefix the whole state by its total length */ +if (state_offset == 0) { +qemu_put_be64(f, remaining); +} + +this_chunk_size = MIN(remaining, chunk_size); +qemu_put_buffer(f, shm_buf, this_chunk_size); +state_offset += this_chunk_size; +} while (remaining >= chunk_size); + +ret = 0; +fail: +/* Have virtiofsd close the shared memory */ +ret2 = vhost_fs_set_state_fd(&fs->vhost_dev, -1, 0); +if (ret2 < 0) { +error_report("Failed to remove state FD from the vhost-user-fs back " + "end: %s", strerror(-ret)); +if (ret == 0) { +ret = ret2; +} +} + +early_fail: +if (shm_buf) { +qemu_memfd_free(shm_buf, chunk_size, memfd); +} + +return ret; +} + +/** + * Load the back-end's (virtiofsd's) internal state from `f` and send + * it over to that back-end. + */ +static int vuf_load_state(QEMUFile *f, void *pv, size_t size, + const VMStateField *field) +{ +VirtIODevice *vdev = pv; +VHostUserFS *fs = VHOST_USER_FS(vdev); +int memfd = -1; +/* Size of the shared memory through which to transfer the state */ +const size_t chunk_size = 4 * 1024 * 1024; +size_t state_offset; +uint64_t remaining; +void *shm_buf; +Error *local_err = NULL; +int ret, ret2; + +/* The state is prefixed by its total length, read that first */ +remaining = qemu_get_be64(f); + +/* Set up shared memory through which to send the state to virtiofsd */ +shm_buf = qemu_memfd_alloc("vhost-fs-state", chunk_size, + F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW, + &memfd, &local_err); +if (!shm_buf) { +error_report_err(local_err); +ret = -ENOMEM; +goto early_fail; +} + +/* Share the SHM area with virtiofsd */ +ret = vhost_fs_set_state_fd(&fs->vhost_dev, memfd, chunk_size); +if (ret < 0) { +goto early_fail; +} + +/* + * Read the virtiofsd state in chunks from `f`, and send them over + * to virtiofsd + */ +state_offset = 0; +do { +size_t this_chunk_size = MIN(remaining, chunk_size); + +if (qemu_get_buffer(f, shm_buf, this_chunk_size) < this_chunk_size) { +ret = -EINVAL; +goto fail; +} + +
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 01/03/2023 22:22, Michael S. Tsirkin wrote: On Wed, Mar 01, 2023 at 09:35:56PM +0200, Anton Kuchin wrote: I do trust them :) I have to, otherwise we would need to pack all the properties and flags of qemu to the migration stream and validate them at destination or entire migration ends up broken beyond repair if orchestrators tend to do stupid things and need babysitting. This is not at all a crazy idea. It has some disadvantages too esp around cross version migration, which is why we don't do this yet. Indeed. I know VMMs that chose this path. But as long as qemu decided to trust orchestrators I think we'd better keep this consistent across the codebase. Only ivshmem_pre_load callback bothers to check device property and virtio_net_tx_waiting_pre_load checks that number of queue pairs doesn't exceed allowed maximum, all other *_pre_load functions generally just initialize some parts of state that need to be set before stream starts loading.
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 01/03/2023 17:52, Michael S. Tsirkin wrote: On Wed, Mar 01, 2023 at 05:40:09PM +0200, Anton Kuchin wrote: So catching errors in not the only purpose of this property, but it definitely allows us to catch some obvious ones. OK let's say this. If migration=external then migration just works. If migration=internal it fails for now. We are agreed here right? Our argument is whether to check on load or save? I propose this compromize: two properties: migration-load and migration-save migration-load : how is incoming migration handled. internal - through qemu external - through the daemon checked in pre-load migration-save : how is outgoing migration handled. internal - through qemu external - through the daemon checked in post-save This way whether the check is on source or destination or both is up to the user. Hmm? Stefan, what do you think about this idea?
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 01/03/2023 19:17, Michael S. Tsirkin wrote: On Wed, Mar 01, 2023 at 06:04:31PM +0200, Anton Kuchin wrote: On 01/03/2023 17:24, Michael S. Tsirkin wrote: On Wed, Mar 01, 2023 at 05:07:28PM +0200, Anton Kuchin wrote: On 28/02/2023 23:24, Michael S. Tsirkin wrote: On Tue, Feb 28, 2023 at 07:59:54PM +0200, Anton Kuchin wrote: On 28/02/2023 16:57, Michael S. Tsirkin wrote: On Tue, Feb 28, 2023 at 04:30:36PM +0200, Anton Kuchin wrote: I really don't understand why and what do you want to check on destination. Yes I understand your patch controls source. Let me try to rephrase why I think it's better on destination. Here's my understanding - With vhost-user-fs state lives inside an external daemon. A- If after load you connect to the same daemon you can get migration mostly for free. B- If you connect to a different daemon then that daemon will need to pass information from original one. Is this a fair summary? Current solution is to set flag on the source meaning "I have an orchestration tool that will make sure that either A or B is correct". However both A and B can only be known when destination is known. Especially as long as what we are really trying to do is just allow qemu restarts, Checking the flag on load will thus achive it in a cleaner way, in that orchestration tool can reasonably keep the flag clear normally and only set it if restarting qemu locally. By comparison, with your approach orchestration tool will have to either always set the flag (risky since then we lose the extra check that we coded) or keep it clear and set before migration (complex). I hope I explained what and why I want to check. I am far from a vhost-user-fs expert so maybe I am wrong but I wanted to make sure I got the point across even if other disagree. Thank you for the explanation. Now I understand your concerns. You are right about this mechanism being a bit risky if orchestrator is not using it properly or clunky if it is used in a safest possible way. That's why first attempt of this feature was with migration capability to let orchestrator choose behavior right at the moment of migration. But it has its own problems. We can't move this check only to destination because one of main goals was to prevent orchestrators that are unaware of vhost-user-fs specifics from accidentally migrating such VMs. We can't rely here entirely on destination to block this because if VM is migrated to file and then can't be loaded by destination there is no way to fallback and resume the source so we need to have some kind of blocker on source by default. Interesting. Why is there no way? Just load it back on source? Isn't this how any other load failure is managed? Because for sure you need to manage these, they will happen. Because source can be already terminated So start it again. What is the difference between restarting the source and restarting the destination to retry migration? If stream is correct it can be loaded by destination if it is broken it won't be accepted at source too. No. First, destination has a different qemu version. Second file can be corrupted in transfer. Third transfer can fail. Etc ... and if load is not supported by orchestrator and backend stream can't be loaded on source too. How can an orchestrator not support load but support migration? I was talking about orchestrators that rely on old device behavior of blocking migration. They could attempt migration anyway and check if it was blocked that is far from ideal but was OK and safe, and now this becomes dangerous because state can be lost and VM becomes unloadable. So we need to ensure that only orchestrators that know what they are doing explicitly enable the feature are allowed to start migration. that seems par for the course - if you want to use a feature you better have an idea about how to do it. If orchestrator is doing things like migrating to file then scp that file, then it better be prepared to restart VM on source because sometimes it will fail on destination. And an orchestrator that is not clever enough to do it, then it just should not come up with funky ways to do migration. Said that checking on destination would need another flag and the safe way of using this feature would require managing two flags instead of one making it even more fragile. So I'd prefer not to make it more complex. In my opinion the best way to use this property by orchestrator is to leave default unmigratable behavior at start and just before migration when destination is known enumerate all vhost-user-fs devices and set properties according to their backends capability with QMP like you mentioned. This gives us single point of making the decision for each device and avoids guessing future at VM start. this means that you need to remember what the values were and then any failure on destination requires you to go back and set them to original valu
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 01/03/2023 17:52, Michael S. Tsirkin wrote: On Wed, Mar 01, 2023 at 05:40:09PM +0200, Anton Kuchin wrote: So catching errors in not the only purpose of this property, but it definitely allows us to catch some obvious ones. OK let's say this. If migration=external then migration just works. If migration=internal it fails for now. We are agreed here right? Our argument is whether to check on load or save? I propose this compromize: two properties: migration-load and migration-save migration-load : how is incoming migration handled. internal - through qemu external - through the daemon checked in pre-load migration-save : how is outgoing migration handled. internal - through qemu external - through the daemon checked in post-save This way whether the check is on source or destination or both is up to the user. Hmm? Hmm, sounds interesting, this can be a really good compromise. So migration-save will be "none" by default to protect unaware orchestrators. What do you think should migration-load be "none" too to force orchestrator set proper incoming migration type?
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 01/03/2023 17:24, Michael S. Tsirkin wrote: On Wed, Mar 01, 2023 at 05:07:28PM +0200, Anton Kuchin wrote: On 28/02/2023 23:24, Michael S. Tsirkin wrote: On Tue, Feb 28, 2023 at 07:59:54PM +0200, Anton Kuchin wrote: On 28/02/2023 16:57, Michael S. Tsirkin wrote: On Tue, Feb 28, 2023 at 04:30:36PM +0200, Anton Kuchin wrote: I really don't understand why and what do you want to check on destination. Yes I understand your patch controls source. Let me try to rephrase why I think it's better on destination. Here's my understanding - With vhost-user-fs state lives inside an external daemon. A- If after load you connect to the same daemon you can get migration mostly for free. B- If you connect to a different daemon then that daemon will need to pass information from original one. Is this a fair summary? Current solution is to set flag on the source meaning "I have an orchestration tool that will make sure that either A or B is correct". However both A and B can only be known when destination is known. Especially as long as what we are really trying to do is just allow qemu restarts, Checking the flag on load will thus achive it in a cleaner way, in that orchestration tool can reasonably keep the flag clear normally and only set it if restarting qemu locally. By comparison, with your approach orchestration tool will have to either always set the flag (risky since then we lose the extra check that we coded) or keep it clear and set before migration (complex). I hope I explained what and why I want to check. I am far from a vhost-user-fs expert so maybe I am wrong but I wanted to make sure I got the point across even if other disagree. Thank you for the explanation. Now I understand your concerns. You are right about this mechanism being a bit risky if orchestrator is not using it properly or clunky if it is used in a safest possible way. That's why first attempt of this feature was with migration capability to let orchestrator choose behavior right at the moment of migration. But it has its own problems. We can't move this check only to destination because one of main goals was to prevent orchestrators that are unaware of vhost-user-fs specifics from accidentally migrating such VMs. We can't rely here entirely on destination to block this because if VM is migrated to file and then can't be loaded by destination there is no way to fallback and resume the source so we need to have some kind of blocker on source by default. Interesting. Why is there no way? Just load it back on source? Isn't this how any other load failure is managed? Because for sure you need to manage these, they will happen. Because source can be already terminated So start it again. What is the difference between restarting the source and restarting the destination to retry migration? If stream is correct it can be loaded by destination if it is broken it won't be accepted at source too. and if load is not supported by orchestrator and backend stream can't be loaded on source too. How can an orchestrator not support load but support migration? I was talking about orchestrators that rely on old device behavior of blocking migration. They could attempt migration anyway and check if it was blocked that is far from ideal but was OK and safe, and now this becomes dangerous because state can be lost and VM becomes unloadable. So we need to ensure that only orchestrators that know what they are doing explicitly enable the feature are allowed to start migration. that seems par for the course - if you want to use a feature you better have an idea about how to do it. If orchestrator is doing things like migrating to file then scp that file, then it better be prepared to restart VM on source because sometimes it will fail on destination. And an orchestrator that is not clever enough to do it, then it just should not come up with funky ways to do migration. Said that checking on destination would need another flag and the safe way of using this feature would require managing two flags instead of one making it even more fragile. So I'd prefer not to make it more complex. In my opinion the best way to use this property by orchestrator is to leave default unmigratable behavior at start and just before migration when destination is known enumerate all vhost-user-fs devices and set properties according to their backends capability with QMP like you mentioned. This gives us single point of making the decision for each device and avoids guessing future at VM start. this means that you need to remember what the values were and then any failure on destination requires you to go back and set them to original values. With possibility of crashes on the orchestrator you also need to recall the temporary values in some file ... This is huge complexity much worse than two flags. Assuming we need two let's see whether just reload on source is good
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 01/03/2023 16:46, Michael S. Tsirkin wrote: On Wed, Mar 01, 2023 at 05:03:03PM +0300, Vladimir Sementsov-Ogievskiy wrote: On 01.03.23 00:24, Michael S. Tsirkin wrote: Said that checking on destination would need another flag and the safe way of using this feature would require managing two flags instead of one making it even more fragile. So I'd prefer not to make it more complex. In my opinion the best way to use this property by orchestrator is to leave default unmigratable behavior at start and just before migration when destination is known enumerate all vhost-user-fs devices and set properties according to their backends capability with QMP like you mentioned. This gives us single point of making the decision for each device and avoids guessing future at VM start. this means that you need to remember what the values were and then any failure on destination requires you to go back and set them to original values. Why do we need to restore old values? To get back to where you were before you were starting migration. For me, this new property is a kind of per-device migration capability. Do we care to restore migration capabilities to the values that they had before setting them for failed migration? We don't need it, as we just always set capabilities as we want before each migration. Same thing for this new property: just set it properly before migration and you don't need to care about restoring it after failed migration attempt. If you really trust your management then we can just remove the migration blocker and be done with it. All this song and dance with changing properties is to catch errors. If one has to carefully play with QOM to achieve the desired result then IMHO we failed in this. To be honest I would prefer just removing blocker because if orchestrator doesn't know what it is doing it has lots of different ways to break things and we can't do anything about it. Just like vhost-user-scsi always allows migration since the day it was introduced without additional checks and relies on the orchestrator. But migration was blocked for vhost-user-fs when it was initially merged and it is bad to change this contract now. This property here is not only to block migration by default and catch errors but really to select migration type. External migration can be sometimes preferred even after internal is implemented because it requires less calls to backend to extract internal state, less code to execute in order to save and restore daemon state. And this also will allow compatibility with old VMs that support only external migration to move to internal migration without reboot someday when it is implemented. So catching errors in not the only purpose of this property, but it definitely allows us to catch some obvious ones. With possibility of crashes on the orchestrator you also need to recall the temporary values in some file ... This is huge complexity much worse than two flags. Assuming we need two let's see whether just reload on source is good enough. -- Best regards, Vladimir
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 28/02/2023 23:24, Michael S. Tsirkin wrote: On Tue, Feb 28, 2023 at 07:59:54PM +0200, Anton Kuchin wrote: On 28/02/2023 16:57, Michael S. Tsirkin wrote: On Tue, Feb 28, 2023 at 04:30:36PM +0200, Anton Kuchin wrote: I really don't understand why and what do you want to check on destination. Yes I understand your patch controls source. Let me try to rephrase why I think it's better on destination. Here's my understanding - With vhost-user-fs state lives inside an external daemon. A- If after load you connect to the same daemon you can get migration mostly for free. B- If you connect to a different daemon then that daemon will need to pass information from original one. Is this a fair summary? Current solution is to set flag on the source meaning "I have an orchestration tool that will make sure that either A or B is correct". However both A and B can only be known when destination is known. Especially as long as what we are really trying to do is just allow qemu restarts, Checking the flag on load will thus achive it in a cleaner way, in that orchestration tool can reasonably keep the flag clear normally and only set it if restarting qemu locally. By comparison, with your approach orchestration tool will have to either always set the flag (risky since then we lose the extra check that we coded) or keep it clear and set before migration (complex). I hope I explained what and why I want to check. I am far from a vhost-user-fs expert so maybe I am wrong but I wanted to make sure I got the point across even if other disagree. Thank you for the explanation. Now I understand your concerns. You are right about this mechanism being a bit risky if orchestrator is not using it properly or clunky if it is used in a safest possible way. That's why first attempt of this feature was with migration capability to let orchestrator choose behavior right at the moment of migration. But it has its own problems. We can't move this check only to destination because one of main goals was to prevent orchestrators that are unaware of vhost-user-fs specifics from accidentally migrating such VMs. We can't rely here entirely on destination to block this because if VM is migrated to file and then can't be loaded by destination there is no way to fallback and resume the source so we need to have some kind of blocker on source by default. Interesting. Why is there no way? Just load it back on source? Isn't this how any other load failure is managed? Because for sure you need to manage these, they will happen. Because source can be already terminated and if load is not supported by orchestrator and backend stream can't be loaded on source too. So we need to ensure that only orchestrators that know what they are doing explicitly enable the feature are allowed to start migration. Said that checking on destination would need another flag and the safe way of using this feature would require managing two flags instead of one making it even more fragile. So I'd prefer not to make it more complex. In my opinion the best way to use this property by orchestrator is to leave default unmigratable behavior at start and just before migration when destination is known enumerate all vhost-user-fs devices and set properties according to their backends capability with QMP like you mentioned. This gives us single point of making the decision for each device and avoids guessing future at VM start. this means that you need to remember what the values were and then any failure on destination requires you to go back and set them to original values. With possibility of crashes on the orchestrator you also need to recall the temporary values in some file ... This is huge complexity much worse than two flags. Assuming we need two let's see whether just reload on source is good enough. Reload on source can't be guaranteed to work too. And even if we could guarantee it to work then we would also need to setup its incoming migration type in case outgoing migration fails. If orchestrator crashes and restarts it can revert flags for all devices or can rely on next migration code to setup them correctly because they have no effect between migrations anyway. Reverting migration that failed on destination is not an easy task too. It seems to be much more complicated than refusing to migrate on source. I believe we should perform sanity checks if we have data but engineering additional checks and putting extra restrictions just to prevent orchestrator from doing wrong things is an overkill. But allowing setup via command-line is valid too because some backends may always be capable of external migration independent of hosts and don't need the manipulations with QMP before migration at all. I am much more worried that the realistic schenario is hard to manage safely than about theoretical state migrating backends that don't exist.
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 28/02/2023 16:57, Michael S. Tsirkin wrote: On Tue, Feb 28, 2023 at 04:30:36PM +0200, Anton Kuchin wrote: I really don't understand why and what do you want to check on destination. Yes I understand your patch controls source. Let me try to rephrase why I think it's better on destination. Here's my understanding - With vhost-user-fs state lives inside an external daemon. A- If after load you connect to the same daemon you can get migration mostly for free. B- If you connect to a different daemon then that daemon will need to pass information from original one. Is this a fair summary? Current solution is to set flag on the source meaning "I have an orchestration tool that will make sure that either A or B is correct". However both A and B can only be known when destination is known. Especially as long as what we are really trying to do is just allow qemu restarts, Checking the flag on load will thus achive it in a cleaner way, in that orchestration tool can reasonably keep the flag clear normally and only set it if restarting qemu locally. By comparison, with your approach orchestration tool will have to either always set the flag (risky since then we lose the extra check that we coded) or keep it clear and set before migration (complex). I hope I explained what and why I want to check. I am far from a vhost-user-fs expert so maybe I am wrong but I wanted to make sure I got the point across even if other disagree. Thank you for the explanation. Now I understand your concerns. You are right about this mechanism being a bit risky if orchestrator is not using it properly or clunky if it is used in a safest possible way. That's why first attempt of this feature was with migration capability to let orchestrator choose behavior right at the moment of migration. But it has its own problems. We can't move this check only to destination because one of main goals was to prevent orchestrators that are unaware of vhost-user-fs specifics from accidentally migrating such VMs. We can't rely here entirely on destination to block this because if VM is migrated to file and then can't be loaded by destination there is no way to fallback and resume the source so we need to have some kind of blocker on source by default. Said that checking on destination would need another flag and the safe way of using this feature would require managing two flags instead of one making it even more fragile. So I'd prefer not to make it more complex. In my opinion the best way to use this property by orchestrator is to leave default unmigratable behavior at start and just before migration when destination is known enumerate all vhost-user-fs devices and set properties according to their backends capability with QMP like you mentioned. This gives us single point of making the decision for each device and avoids guessing future at VM start. But allowing setup via command-line is valid too because some backends may always be capable of external migration independent of hosts and don't need the manipulations with QMP before migration at all.
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 24/02/2023 10:47, Michael S. Tsirkin wrote: On Thu, Feb 23, 2023 at 04:24:43PM -0500, Stefan Hajnoczi wrote: On Thu, Feb 23, 2023 at 02:36:33AM -0500, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 03:21:42PM -0500, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 08:25:19PM +0200, Anton Kuchin wrote: On 22/02/2023 19:12, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote: On 22/02/2023 18:51, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote: On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote: On 22.02.23 17:25, Anton Kuchin wrote: +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. But destination is a "source" for next migration, so there shouldn't be real difference. The new property has ".realized_set_allowed = true", so, as I understand it may be changed at any time, so that's not a problem. Yes, exactly. So destination's property sets not how it will handle this incoming migration but the future outgoing one. How do you know where you are going to migrate though? I think you don't. Setting it on source is better since we know where we are migrating from. Yes, I don't know where I'm going to migrate to. This is why property affects only how source saves state on outgoing migration. Um. I don't get the logic. For this feature to work we need orchestrator to manage the migration. And we generally assume that it is responsibility of orchestrator to ensure matching properties on source and destination. As orchestrator manages both sides of migration it can set option (and we can check it) on either source or destination. Now it's not important which side we select, because now the option is essentially binary allow/deny (but IMHO it is much better to refuse source to migrate than find later that state can't be loaded by destination, in case of file migration this becomes especially painful). But there are plans to add internal migration option (extract FUSE state from backend and transfer it in QEMU migration stream), and that's where setting/checking on source becomes important because it will rely on this property to decide if extra state form backend needs to be put in the migration stream subsection. If we do internal migration that will be a different property which has to match on source *and* destination. If you are concerned about orchestrator breaking assumption of matching properties on source and destination this is not really supported AFAIK but I don't think we need to punish it for this, maybe it has its reasons: I can imagine scenario where orchestrator could want to migrate from source with 'migration=external' to destination with 'migration=none' to ensure that destination can't be migrated further. No. I am concerned about a simple practical matter: - I decide to restart qemu on the same host - so I need to enable migration - Later I decide to migrate qemu to another host - this should be blocked Property on source does not satisfy both at the same time. Property on destination does. Stefan what's your take on this? Should we move this from save to load hook? This property can be changed on the source at runtime via qom-set, so you don't need to predict the future. The device can be started from an incoming migration with "external" and then set to "stateful" migration to migrate to another host later on. And then you are supposed to change it back if migration fails? External tool failures have to be handled ... How likely is all this fragile dance not to have bugs? And all of this effort for what? Just to make the case of a buggy management tool fail a bit faster - is it really worth it? Compare to setting it on destination where you can set it on command line or through qom and forget about it. No? I really don't understand why and what do you want to check on destination. This option is supposed to control outgoing migratio
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 24/02/2023 06:14, Anton Kuchin wrote: On 23/02/2023 23:24, Stefan Hajnoczi wrote: On Thu, Feb 23, 2023 at 02:36:33AM -0500, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 03:21:42PM -0500, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 08:25:19PM +0200, Anton Kuchin wrote: On 22/02/2023 19:12, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote: On 22/02/2023 18:51, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote: On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote: On 22.02.23 17:25, Anton Kuchin wrote: +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. But destination is a "source" for next migration, so there shouldn't be real difference. The new property has ".realized_set_allowed = true", so, as I understand it may be changed at any time, so that's not a problem. Yes, exactly. So destination's property sets not how it will handle this incoming migration but the future outgoing one. How do you know where you are going to migrate though? I think you don't. Setting it on source is better since we know where we are migrating from. Yes, I don't know where I'm going to migrate to. This is why property affects only how source saves state on outgoing migration. Um. I don't get the logic. For this feature to work we need orchestrator to manage the migration. And we generally assume that it is responsibility of orchestrator to ensure matching properties on source and destination. As orchestrator manages both sides of migration it can set option (and we can check it) on either source or destination. Now it's not important which side we select, because now the option is essentially binary allow/deny (but IMHO it is much better to refuse source to migrate than find later that state can't be loaded by destination, in case of file migration this becomes especially painful). But there are plans to add internal migration option (extract FUSE state from backend and transfer it in QEMU migration stream), and that's where setting/checking on source becomes important because it will rely on this property to decide if extra state form backend needs to be put in the migration stream subsection. If we do internal migration that will be a different property which has to match on source *and* destination. If you are concerned about orchestrator breaking assumption of matching properties on source and destination this is not really supported AFAIK but I don't think we need to punish it for this, maybe it has its reasons: I can imagine scenario where orchestrator could want to migrate from source with 'migration=external' to destination with 'migration=none' to ensure that destination can't be migrated further. No. I am concerned about a simple practical matter: - I decide to restart qemu on the same host - so I need to enable migration - Later I decide to migrate qemu to another host - this should be blocked Property on source does not satisfy both at the same time. Property on destination does. Stefan what's your take on this? Should we move this from save to load hook? This property can be changed on the source at runtime via qom-set, so you don't need to predict the future. The device can be started from an incoming migration with "external" and then set to "stateful" migration to migrate to another host later on. Anton, can you share the virtiofsd patches so we have a full picture of how "external" migration works? I'd like to understand the workflow and also how it can be extended when "stateful" migration is added. Unfortunately internal implementation is relying heavily on our infrastructure, and rust virtiofsd still lacks dirty tracking so it is not ready yet. But I did have a PoC for deprecated now C virtiofsd that I didn't bother to prepare for upstreaming because C version was declared unsupported
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 23/02/2023 23:24, Stefan Hajnoczi wrote: On Thu, Feb 23, 2023 at 02:36:33AM -0500, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 03:21:42PM -0500, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 08:25:19PM +0200, Anton Kuchin wrote: On 22/02/2023 19:12, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote: On 22/02/2023 18:51, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote: On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote: On 22.02.23 17:25, Anton Kuchin wrote: +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. But destination is a "source" for next migration, so there shouldn't be real difference. The new property has ".realized_set_allowed = true", so, as I understand it may be changed at any time, so that's not a problem. Yes, exactly. So destination's property sets not how it will handle this incoming migration but the future outgoing one. How do you know where you are going to migrate though? I think you don't. Setting it on source is better since we know where we are migrating from. Yes, I don't know where I'm going to migrate to. This is why property affects only how source saves state on outgoing migration. Um. I don't get the logic. For this feature to work we need orchestrator to manage the migration. And we generally assume that it is responsibility of orchestrator to ensure matching properties on source and destination. As orchestrator manages both sides of migration it can set option (and we can check it) on either source or destination. Now it's not important which side we select, because now the option is essentially binary allow/deny (but IMHO it is much better to refuse source to migrate than find later that state can't be loaded by destination, in case of file migration this becomes especially painful). But there are plans to add internal migration option (extract FUSE state from backend and transfer it in QEMU migration stream), and that's where setting/checking on source becomes important because it will rely on this property to decide if extra state form backend needs to be put in the migration stream subsection. If we do internal migration that will be a different property which has to match on source *and* destination. If you are concerned about orchestrator breaking assumption of matching properties on source and destination this is not really supported AFAIK but I don't think we need to punish it for this, maybe it has its reasons: I can imagine scenario where orchestrator could want to migrate from source with 'migration=external' to destination with 'migration=none' to ensure that destination can't be migrated further. No. I am concerned about a simple practical matter: - I decide to restart qemu on the same host - so I need to enable migration - Later I decide to migrate qemu to another host - this should be blocked Property on source does not satisfy both at the same time. Property on destination does. Stefan what's your take on this? Should we move this from save to load hook? This property can be changed on the source at runtime via qom-set, so you don't need to predict the future. The device can be started from an incoming migration with "external" and then set to "stateful" migration to migrate to another host later on. Anton, can you share the virtiofsd patches so we have a full picture of how "external" migration works? I'd like to understand the workflow and also how it can be extended when "stateful" migration is added. Unfortunately internal implementation is relying heavily on our infrastructure, and rust virtiofsd still lacks dirty tracking so it is not ready yet. But I did have a PoC for deprecated now C virtiofsd that I didn't bother to prepare for upstreaming because C version was declared unsupported. It essentially adds reconnect and this was the only thing required from
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 22/02/2023 22:21, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 08:25:19PM +0200, Anton Kuchin wrote: On 22/02/2023 19:12, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote: On 22/02/2023 18:51, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote: On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote: On 22.02.23 17:25, Anton Kuchin wrote: +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. But destination is a "source" for next migration, so there shouldn't be real difference. The new property has ".realized_set_allowed = true", so, as I understand it may be changed at any time, so that's not a problem. Yes, exactly. So destination's property sets not how it will handle this incoming migration but the future outgoing one. How do you know where you are going to migrate though? I think you don't. Setting it on source is better since we know where we are migrating from. Yes, I don't know where I'm going to migrate to. This is why property affects only how source saves state on outgoing migration. Um. I don't get the logic. For this feature to work we need orchestrator to manage the migration. And we generally assume that it is responsibility of orchestrator to ensure matching properties on source and destination. As orchestrator manages both sides of migration it can set option (and we can check it) on either source or destination. Now it's not important which side we select, because now the option is essentially binary allow/deny (but IMHO it is much better to refuse source to migrate than find later that state can't be loaded by destination, in case of file migration this becomes especially painful). But there are plans to add internal migration option (extract FUSE state from backend and transfer it in QEMU migration stream), and that's where setting/checking on source becomes important because it will rely on this property to decide if extra state form backend needs to be put in the migration stream subsection. If we do internal migration that will be a different property which has to match on source *and* destination. I'm not sure if we need other property. Initial idea was to allow orchestrator setup which part of state qemu should put to stream that will be sufficient to restore VM on destination. But this depends on how external migration will be implemented. If you are concerned about orchestrator breaking assumption of matching properties on source and destination this is not really supported AFAIK but I don't think we need to punish it for this, maybe it has its reasons: I can imagine scenario where orchestrator could want to migrate from source with 'migration=external' to destination with 'migration=none' to ensure that destination can't be migrated further. No. I am concerned about a simple practical matter: - I decide to restart qemu on the same host - so I need to enable migration - Later I decide to migrate qemu to another host - this should be blocked Property on source does not satisfy both at the same time. Property on destination does. If destination QEMUs on local and remote hosts have same properties how can we write check that passes on the same host and fails on remote? Sorry, I don't understand how qemu can help to handle this. It knows nothing about the hosts so this is responsibility of management to software to know where it can migrate and configure it appropriately. Maybe I didn't understand your scenario or what you propose to check on destination. Could you explain a bit more? This property selects if VM can migrate and if it can what should qemu put to the migration stream. So we select on source what type of migration is allowed for this VM, destination can't check anything at load time. OK, so the new field "migration" regulates only outgoing migration and do nothing for incoming. On incoming migration the migration
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 22/02/2023 19:12, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 07:05:47PM +0200, Anton Kuchin wrote: On 22/02/2023 18:51, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote: On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote: On 22.02.23 17:25, Anton Kuchin wrote: +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. But destination is a "source" for next migration, so there shouldn't be real difference. The new property has ".realized_set_allowed = true", so, as I understand it may be changed at any time, so that's not a problem. Yes, exactly. So destination's property sets not how it will handle this incoming migration but the future outgoing one. How do you know where you are going to migrate though? I think you don't. Setting it on source is better since we know where we are migrating from. Yes, I don't know where I'm going to migrate to. This is why property affects only how source saves state on outgoing migration. Um. I don't get the logic. For this feature to work we need orchestrator to manage the migration. And we generally assume that it is responsibility of orchestrator to ensure matching properties on source and destination. As orchestrator manages both sides of migration it can set option (and we can check it) on either source or destination. Now it's not important which side we select, because now the option is essentially binary allow/deny (but IMHO it is much better to refuse source to migrate than find later that state can't be loaded by destination, in case of file migration this becomes especially painful). But there are plans to add internal migration option (extract FUSE state from backend and transfer it in QEMU migration stream), and that's where setting/checking on source becomes important because it will rely on this property to decide if extra state form backend needs to be put in the migration stream subsection. If you are concerned about orchestrator breaking assumption of matching properties on source and destination this is not really supported AFAIK but I don't think we need to punish it for this, maybe it has its reasons: I can imagine scenario where orchestrator could want to migrate from source with 'migration=external' to destination with 'migration=none' to ensure that destination can't be migrated further. This property selects if VM can migrate and if it can what should qemu put to the migration stream. So we select on source what type of migration is allowed for this VM, destination can't check anything at load time. OK, so the new field "migration" regulates only outgoing migration and do nothing for incoming. On incoming migration the migration stream itself defines the type of device migration. Worth mentioning in doc? Good point. I don't think this deserves a respin but if I have to send v4 I'll include clarification in it.
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 22/02/2023 18:43, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 06:14:31PM +0300, Vladimir Sementsov-Ogievskiy wrote: On 22.02.23 17:25, Anton Kuchin wrote: +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. But destination is a "source" for next migration, so there shouldn't be real difference. And whether to allow that migration should be decided by destination of that migration. Destination can just refuse to load unsupported state. But this happens automatically if migration code finds unknown subsection and needs no explicit check by device .pre_load. The new property has ".realized_set_allowed = true", so, as I understand it may be changed at any time, so that's not a problem. This property selects if VM can migrate and if it can what should qemu put to the migration stream. So we select on source what type of migration is allowed for this VM, destination can't check anything at load time. OK, so the new field "migration" regulates only outgoing migration and do nothing for incoming. On incoming migration the migration stream itself defines the type of device migration. Worth mentioning in doc? -- Best regards, Vladimir
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 22/02/2023 18:51, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 06:49:10PM +0200, Anton Kuchin wrote: On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote: On 22.02.23 17:25, Anton Kuchin wrote: +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. But destination is a "source" for next migration, so there shouldn't be real difference. The new property has ".realized_set_allowed = true", so, as I understand it may be changed at any time, so that's not a problem. Yes, exactly. So destination's property sets not how it will handle this incoming migration but the future outgoing one. How do you know where you are going to migrate though? I think you don't. Setting it on source is better since we know where we are migrating from. Yes, I don't know where I'm going to migrate to. This is why property affects only how source saves state on outgoing migration. This property selects if VM can migrate and if it can what should qemu put to the migration stream. So we select on source what type of migration is allowed for this VM, destination can't check anything at load time. OK, so the new field "migration" regulates only outgoing migration and do nothing for incoming. On incoming migration the migration stream itself defines the type of device migration. Worth mentioning in doc? Good point. I don't think this deserves a respin but if I have to send v4 I'll include clarification in it.
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 22/02/2023 17:14, Vladimir Sementsov-Ogievskiy wrote: On 22.02.23 17:25, Anton Kuchin wrote: +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. But destination is a "source" for next migration, so there shouldn't be real difference. The new property has ".realized_set_allowed = true", so, as I understand it may be changed at any time, so that's not a problem. Yes, exactly. So destination's property sets not how it will handle this incoming migration but the future outgoing one. This property selects if VM can migrate and if it can what should qemu put to the migration stream. So we select on source what type of migration is allowed for this VM, destination can't check anything at load time. OK, so the new field "migration" regulates only outgoing migration and do nothing for incoming. On incoming migration the migration stream itself defines the type of device migration. Worth mentioning in doc? Good point. I don't think this deserves a respin but if I have to send v4 I'll include clarification in it.
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 22/02/2023 14:43, Michael S. Tsirkin wrote: On Wed, Feb 22, 2023 at 03:20:00PM +0300, Vladimir Sementsov-Ogievskiy wrote: On 17.02.23 20:00, Anton Kuchin wrote: Migration of vhost-user-fs device requires transfer of FUSE internal state from backend. There is no standard way to do it now so by default migration must be blocked. But if this state can be externally transferred by orchestrator give it an option to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/core/qdev-properties-system.c| 10 + hw/virtio/vhost-user-fs.c | 32 - include/hw/qdev-properties-system.h | 1 + include/hw/virtio/vhost-user-fs.h | 2 ++ qapi/migration.json | 16 +++ 5 files changed, 60 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index d42493f630..d9b1aa2a5d 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = { .set = set_uuid, .set_default_value = set_default_uuid_auto, }; + +const PropertyInfo qdev_prop_vhost_user_migration_type = { +.name = "VhostUserMigrationType", +.description = "none/external", +.enum_table = &VhostUserMigrationType_lookup, +.get = qdev_propinfo_get_enum, +.set = qdev_propinfo_set_enum, +.set_default_value = qdev_propinfo_set_default_value_enum, +.realized_set_allowed = true, +}; diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index 83fc20e49e..7deb9df5ec 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -298,9 +298,35 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +VHostUserFS *fs = opaque; +g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + +switch (fs->migration_type) { +case VHOST_USER_MIGRATION_TYPE_NONE: +error_report("Migration is blocked by device %s", path); +break; +case VHOST_USER_MIGRATION_TYPE_EXTERNAL: +return 0; +default: +error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); +break; +} + +return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? In fact, I would claim we only want pre_load. When qemu is started on destination we know where it's migrated from so this flag can be set. When qemu is started on source we generally do not yet know so we don't know whether it's safe to set this flag. This property selects if VM can migrate and if it can what should qemu put to the migration stream. So we select on source what type of migration is allowed for this VM, destination can't check anything at load time. + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { @@ -309,6 +335,10 @@ static Property vuf_properties[] = { DEFINE_PROP_UINT16("num-request-queues", VHostUserFS, conf.num_request_queues, 1), DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128), +DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type, + VHOST_USER_MIGRATION_TYPE_NONE, + qdev_prop_vhost_user_migration_type, + VhostUserMigrationType), 1. I see, other similar qdev_prop_* use DEFINE_PROP_SIGNED 2. All of them except only qdev_prop_fdc_drive_type, define also a convenient macro in include/hw/qdev-properties-system.h should we follow these patterns? -- Best regards, Vladimir
Re: [PATCH v3 1/1] vhost-user-fs: add migration type property
On 22/02/2023 14:20, Vladimir Sementsov-Ogievskiy wrote: On 17.02.23 20:00, Anton Kuchin wrote: Migration of vhost-user-fs device requires transfer of FUSE internal state from backend. There is no standard way to do it now so by default migration must be blocked. But if this state can be externally transferred by orchestrator give it an option to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/core/qdev-properties-system.c | 10 + hw/virtio/vhost-user-fs.c | 32 - include/hw/qdev-properties-system.h | 1 + include/hw/virtio/vhost-user-fs.h | 2 ++ qapi/migration.json | 16 +++ 5 files changed, 60 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index d42493f630..d9b1aa2a5d 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = { .set = set_uuid, .set_default_value = set_default_uuid_auto, }; + +const PropertyInfo qdev_prop_vhost_user_migration_type = { + .name = "VhostUserMigrationType", + .description = "none/external", + .enum_table = &VhostUserMigrationType_lookup, + .get = qdev_propinfo_get_enum, + .set = qdev_propinfo_set_enum, + .set_default_value = qdev_propinfo_set_default_value_enum, + .realized_set_allowed = true, +}; diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index 83fc20e49e..7deb9df5ec 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -298,9 +298,35 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ + VHostUserFS *fs = opaque; + g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + + switch (fs->migration_type) { + case VHOST_USER_MIGRATION_TYPE_NONE: + error_report("Migration is blocked by device %s", path); + break; + case VHOST_USER_MIGRATION_TYPE_EXTERNAL: + return 0; + default: + error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); + break; + } + + return -1; +} Should we also add this as .pre_load, to force user select correct migration_type on target too? Why do we need it? Enum forces user to select at least one of the sane option and I believe this is enough. As this property affects only save and don't know what we can do at load. + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", - .unmigratable = 1, + .minimum_version_id = 0, + .version_id = 0, + .fields = (VMStateField[]) { + VMSTATE_VIRTIO_DEVICE, + VMSTATE_END_OF_LIST() + }, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { @@ -309,6 +335,10 @@ static Property vuf_properties[] = { DEFINE_PROP_UINT16("num-request-queues", VHostUserFS, conf.num_request_queues, 1), DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128), + DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type, + VHOST_USER_MIGRATION_TYPE_NONE, + qdev_prop_vhost_user_migration_type, + VhostUserMigrationType), 1. I see, other similar qdev_prop_* use DEFINE_PROP_SIGNED I don't think this should be signed. Enum values are non-negative so compilers (at least gcc and clang that I checked) evaluate underlying enum type to be unsigned int. I don't know why other property types use signed, may be they have reasons or just this is how they were initially implemented. 2. All of them except only qdev_prop_fdc_drive_type, define also a convenient macro in include/hw/qdev-properties-system.h This makes sense if property is used in more than one place, in this case I don't see any benefit from writing more code to handle this specific case. Maybe if property finds its usage in other devices this can be done. should we follow these patterns?
[PATCH v3 1/1] vhost-user-fs: add migration type property
Migration of vhost-user-fs device requires transfer of FUSE internal state from backend. There is no standard way to do it now so by default migration must be blocked. But if this state can be externally transferred by orchestrator give it an option to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/core/qdev-properties-system.c| 10 + hw/virtio/vhost-user-fs.c | 32 - include/hw/qdev-properties-system.h | 1 + include/hw/virtio/vhost-user-fs.h | 2 ++ qapi/migration.json | 16 +++ 5 files changed, 60 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index d42493f630..d9b1aa2a5d 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = { .set = set_uuid, .set_default_value = set_default_uuid_auto, }; + +const PropertyInfo qdev_prop_vhost_user_migration_type = { +.name = "VhostUserMigrationType", +.description = "none/external", +.enum_table = &VhostUserMigrationType_lookup, +.get = qdev_propinfo_get_enum, +.set = qdev_propinfo_set_enum, +.set_default_value = qdev_propinfo_set_default_value_enum, +.realized_set_allowed = true, +}; diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index 83fc20e49e..7deb9df5ec 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -298,9 +298,35 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +VHostUserFS *fs = opaque; +g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + +switch (fs->migration_type) { +case VHOST_USER_MIGRATION_TYPE_NONE: +error_report("Migration is blocked by device %s", path); +break; +case VHOST_USER_MIGRATION_TYPE_EXTERNAL: +return 0; +default: +error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); +break; +} + +return -1; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { @@ -309,6 +335,10 @@ static Property vuf_properties[] = { DEFINE_PROP_UINT16("num-request-queues", VHostUserFS, conf.num_request_queues, 1), DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128), +DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type, + VHOST_USER_MIGRATION_TYPE_NONE, + qdev_prop_vhost_user_migration_type, + VhostUserMigrationType), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/qdev-properties-system.h b/include/hw/qdev-properties-system.h index 0ac327ae60..1a67591590 100644 --- a/include/hw/qdev-properties-system.h +++ b/include/hw/qdev-properties-system.h @@ -22,6 +22,7 @@ extern const PropertyInfo qdev_prop_audiodev; extern const PropertyInfo qdev_prop_off_auto_pcibar; extern const PropertyInfo qdev_prop_pcie_link_speed; extern const PropertyInfo qdev_prop_pcie_link_width; +extern const PropertyInfo qdev_prop_vhost_user_migration_type; #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d) \ DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t) diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h index 94c3aaa84e..821b2a121c 100644 --- a/include/hw/virtio/vhost-user-fs.h +++ b/include/hw/virtio/vhost-user-fs.h @@ -19,6 +19,7 @@ #include "hw/virtio/vhost-user.h" #include "chardev/char-fe.h" #include "qom/object.h" +#include "qapi/qapi-types-migration.h" #define TYPE_VHOST_USER_FS "vhost-user-fs-device" OBJECT_DECLARE_SIMPLE_TYPE(VHostUserFS, VHOST_USER_FS) @@ -40,6 +41,7 @@ struct VHostUserFS { VirtQueue **req_vqs; VirtQueue *hiprio_vq; int32_t bootindex; +VhostUserMigrationType migration_type; /*< public >*/ }; diff --git a/qapi/migration.json b/qapi/migration.json index c84fa10e86..78feb20ffc 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -2178,3 +2178,19 @@ 'data': { 'job-id': 'str', 'tag': 'str', 'devices': ['str'] } } + +## +# @VhostUserMigrationType: +# +# Type of vhost-user device migration. +# +# @none: Migration is not supported, attempts to migrate with this de
[PATCH v3 0/1] virtio-fs: implement option for stateless migration.
v3: - Remove migration_type from migration stream - Use enum type for migration_type - Get rid of useless cast - Fix typos - Reword commit message v2: - Use device property instead of migration capability Anton Kuchin (1): vhost-user-fs: add migration type property hw/core/qdev-properties-system.c| 10 + hw/virtio/vhost-user-fs.c | 32 - include/hw/qdev-properties-system.h | 1 + include/hw/virtio/vhost-user-fs.h | 2 ++ qapi/migration.json | 16 +++ 5 files changed, 60 insertions(+), 1 deletion(-) -- 2.37.2
Re: [PATCH v2 1/1] vhost-user-fs: add property to allow migration
/* resend with fixed to: and cc: */ On 16/02/2023 18:22, Juan Quintela wrote: "Michael S. Tsirkin" wrote: On Thu, Feb 16, 2023 at 11:11:22AM -0500, Michael S. Tsirkin wrote: On Thu, Feb 16, 2023 at 03:14:05PM +0100, Juan Quintela wrote: Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But it is good to have an option for orchestrator to tune this according to backend capabilities and migration configuration. This patch adds device property 'migration' that is 'none' by default to keep old behaviour but can be set to 'external' to explicitly allow migration with minimal virtio device state in migration stream if daemon has some way to sync FUSE state on src and dst without help from qemu. Signed-off-by: Anton Kuchin Reviewed-by: Juan Quintela The migration bits are correct. And I can think a better way to explain that one device is migrated externally. I'm bad at wording but I'll try to improve this one. Suggestions will be really appreciated. If you have to respin: +static int vhost_user_fs_pre_save(void *opaque) +{ +VHostUserFS *fs = (VHostUserFS *)opaque; This hack is useless. I will. Will get rid of that, thanks. meaning the cast? yes. I know that there are still lots of code that still have it. Now remember that I have no clue about vhost-user-fs. But this looks fishy static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_UINT8(migration_type, VHostUserFS), +VMSTATE_END_OF_LIST() In fact why do we want to migrate this property? We generally don't, we only migrate state. See previous discussion. In a nutshell, we are going to have internal migration in the future (not done yet). Later, Juan. I think Michael is right. We don't need it at destination to know what data is in the stream because subsections will tell us all we need to know. +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { @@ -309,6 +337,10 @@ static Property vuf_properties[] = { DEFINE_PROP_UINT16("num-request-queues", VHostUserFS, conf.num_request_queues, 1), DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128), +DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type, + VHOST_USER_MIGRATION_TYPE_NONE, + qdev_prop_vhost_user_migration_type, + uint8_t), DEFINE_PROP_END_OF_LIST(), We have four properties here (5 with the new migration one), and you only migrate one. This looks fishy, but I don't know if it makes sense. If they _have_ to be configured the same on source and destination, I would transfer them and check in post_load that the values are correct. Later, Juan. Weird suggestion. We generally don't do this kind of check - that would be open-coding each property. It's management's job to make sure things are consistent. -- MST
Re: [PATCH v2 1/1] vhost-user-fs: add property to allow migration
On 16/02/2023 18:22, Juan Quintela wrote: "Michael S. Tsirkin" wrote: On Thu, Feb 16, 2023 at 11:11:22AM -0500, Michael S. Tsirkin wrote: On Thu, Feb 16, 2023 at 03:14:05PM +0100, Juan Quintela wrote: Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But it is good to have an option for orchestrator to tune this according to backend capabilities and migration configuration. This patch adds device property 'migration' that is 'none' by default to keep old behaviour but can be set to 'external' to explicitly allow migration with minimal virtio device state in migration stream if daemon has some way to sync FUSE state on src and dst without help from qemu. Signed-off-by: Anton Kuchin Reviewed-by: Juan Quintela The migration bits are correct. And I can think a better way to explain that one device is migrated externally. I'm bad at wording but I'll try to improve this one. Suggestions will be really appreciated. If you have to respin: +static int vhost_user_fs_pre_save(void *opaque) +{ +VHostUserFS *fs = (VHostUserFS *)opaque; This hack is useless. I will. Will get rid of that, thanks. meaning the cast? yes. I know that there are still lots of code that still have it. Now remember that I have no clue about vhost-user-fs. But this looks fishy static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_UINT8(migration_type, VHostUserFS), +VMSTATE_END_OF_LIST() In fact why do we want to migrate this property? We generally don't, we only migrate state. See previous discussion. In a nutshell, we are going to have internal migration in the future (not done yet). Later, Juan. I think Michael is right. We don't need it at destination to know what data is in the stream because subsections will tell us all we need to know. +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { @@ -309,6 +337,10 @@ static Property vuf_properties[] = { DEFINE_PROP_UINT16("num-request-queues", VHostUserFS, conf.num_request_queues, 1), DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128), +DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type, + VHOST_USER_MIGRATION_TYPE_NONE, + qdev_prop_vhost_user_migration_type, + uint8_t), DEFINE_PROP_END_OF_LIST(), We have four properties here (5 with the new migration one), and you only migrate one. This looks fishy, but I don't know if it makes sense. If they _have_ to be configured the same on source and destination, I would transfer them and check in post_load that the values are correct. Later, Juan. Weird suggestion. We generally don't do this kind of check - that would be open-coding each property. It's management's job to make sure things are consistent. -- MST
Re: [PATCH v2 1/1] vhost-user-fs: add property to allow migration
On 16/02/2023 23:09, Stefan Hajnoczi wrote: On Thu, Feb 16, 2023 at 04:00:03PM +0200, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But it is good to have an option for orchestrator to tune this according to backend capabilities and migration configuration. This patch adds device property 'migration' that is 'none' by default to keep old behaviour but can be set to 'external' to explicitly allow migration with minimal virtio device state in migration stream if daemon has some way to sync FUSE state on src and dst without help from qemu. Signed-off-by: Anton Kuchin --- hw/core/qdev-properties-system.c| 10 + hw/virtio/vhost-user-fs.c | 34 - include/hw/qdev-properties-system.h | 1 + include/hw/virtio/vhost-user-fs.h | 1 + qapi/migration.json | 16 ++ 5 files changed, 61 insertions(+), 1 deletion(-) Looks okay to me. Comments below. diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index d42493f630..d9b1aa2a5d 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = { .set = set_uuid, .set_default_value = set_default_uuid_auto, }; + +const PropertyInfo qdev_prop_vhost_user_migration_type = { +.name = "VhostUserMigrationType", +.description = "none/external", +.enum_table = &VhostUserMigrationType_lookup, +.get = qdev_propinfo_get_enum, +.set = qdev_propinfo_set_enum, +.set_default_value = qdev_propinfo_set_default_value_enum, +.realized_set_allowed = true, +}; diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index 83fc20e49e..e2a5b6cfdf 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "qapi/qapi-types-migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,36 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +VHostUserFS *fs = (VHostUserFS *)opaque; +g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + +switch (fs->migration_type) { +case VHOST_USER_MIGRATION_TYPE_NONE: +error_report("Migration is blocked by device %s", path); +break; +case VHOST_USER_MIGRATION_TYPE_EXTERNAL: +return 0; +default: +error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); +break; +} + +return -1; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_UINT8(migration_type, VHostUserFS), Maybe add a comment since Michael asked what the purpose of this field is: /* For verifying that source/destination migration= properties match */ VMSTATE_UINT8(migration_type, VHostUserFS), Come to think of it...where is the code that checks the vmstate migration_type field matches the destination device's migration= property? It's a good idea to have a comment here, thanks. This field is not really for verifying that source and destination match. It just describes what data is packed into the stream. In fact this property could be different and this breaks nothing: e.g. when we migrate from version that can export only stateless stream to one that finally supports internal migration. Said that I start thinking that we don't need it in the stream at all because extra data can be detected just by presence of additional subsection. +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { @@ -309,6 +337,10 @@ static Property vuf_properties[] = { DEFINE_PROP_UINT16("num-request-queues", VHostUserFS, conf.num_request_queues, 1), DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128), +DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type, + VHOST_USER_MIGRATION_TYPE_NONE, + qdev_prop_vhost_user_migration_type, + uint8_t), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/qdev-properties-system.h b/include/hw/qdev-properties-system.h index
[PATCH v2 0/1] virtio-fs: implement option for stateless migration.
v2: Use device property instead of migration capability Anton Kuchin (1): vhost-user-fs: add property to allow migration hw/core/qdev-properties-system.c| 10 + hw/virtio/vhost-user-fs.c | 34 - include/hw/qdev-properties-system.h | 1 + include/hw/virtio/vhost-user-fs.h | 1 + qapi/migration.json | 16 ++ 5 files changed, 61 insertions(+), 1 deletion(-) -- 2.37.2
[PATCH v2 1/1] vhost-user-fs: add property to allow migration
Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But it is good to have an option for orchestrator to tune this according to backend capabilities and migration configuration. This patch adds device property 'migration' that is 'none' by default to keep old behaviour but can be set to 'external' to explicitly allow migration with minimal virtio device state in migration stream if daemon has some way to sync FUSE state on src and dst without help from qemu. Signed-off-by: Anton Kuchin --- hw/core/qdev-properties-system.c| 10 + hw/virtio/vhost-user-fs.c | 34 - include/hw/qdev-properties-system.h | 1 + include/hw/virtio/vhost-user-fs.h | 1 + qapi/migration.json | 16 ++ 5 files changed, 61 insertions(+), 1 deletion(-) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index d42493f630..d9b1aa2a5d 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -1143,3 +1143,13 @@ const PropertyInfo qdev_prop_uuid = { .set = set_uuid, .set_default_value = set_default_uuid_auto, }; + +const PropertyInfo qdev_prop_vhost_user_migration_type = { +.name = "VhostUserMigrationType", +.description = "none/external", +.enum_table = &VhostUserMigrationType_lookup, +.get = qdev_propinfo_get_enum, +.set = qdev_propinfo_set_enum, +.set_default_value = qdev_propinfo_set_default_value_enum, +.realized_set_allowed = true, +}; diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index 83fc20e49e..e2a5b6cfdf 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "qapi/qapi-types-migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,36 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +VHostUserFS *fs = (VHostUserFS *)opaque; +g_autofree char *path = object_get_canonical_path(OBJECT(fs)); + +switch (fs->migration_type) { +case VHOST_USER_MIGRATION_TYPE_NONE: +error_report("Migration is blocked by device %s", path); +break; +case VHOST_USER_MIGRATION_TYPE_EXTERNAL: +return 0; +default: +error_report("Migration type '%s' is not supported by device %s", + VhostUserMigrationType_str(fs->migration_type), path); +break; +} + +return -1; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_UINT8(migration_type, VHostUserFS), +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { @@ -309,6 +337,10 @@ static Property vuf_properties[] = { DEFINE_PROP_UINT16("num-request-queues", VHostUserFS, conf.num_request_queues, 1), DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128), +DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type, + VHOST_USER_MIGRATION_TYPE_NONE, + qdev_prop_vhost_user_migration_type, + uint8_t), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/qdev-properties-system.h b/include/hw/qdev-properties-system.h index 0ac327ae60..1a67591590 100644 --- a/include/hw/qdev-properties-system.h +++ b/include/hw/qdev-properties-system.h @@ -22,6 +22,7 @@ extern const PropertyInfo qdev_prop_audiodev; extern const PropertyInfo qdev_prop_off_auto_pcibar; extern const PropertyInfo qdev_prop_pcie_link_speed; extern const PropertyInfo qdev_prop_pcie_link_width; +extern const PropertyInfo qdev_prop_vhost_user_migration_type; #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d) \ DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t) diff --git a/include/hw/virtio/vhost-user-fs.h b/include/hw/virtio/vhost-user-fs.h index 94c3aaa84e..3ebce77be5 100644 --- a/include/hw/virtio/vhost-user-fs.h +++ b/include/hw/virtio/vhost-user-fs.h @@ -40,6 +40,7 @@ struct VHostUserFS { VirtQueue **req_vqs; VirtQueue *hiprio_vq; int32_t bootindex; +uint8_t migration_type; /*< public >*/ }; diff --git a/qapi/migration.json b/qapi/migration.json index c84fa10e86..ababd605a2 100644 --- a/qapi/migration.json +++ b/qapi/migrat
Re: [PATCH v4 13/16] pci: introduce pci_find_the_only_child()
On 13/02/2023 16:01, Vladimir Sementsov-Ogievskiy wrote: To be used in further patch to identify the device hot-plugged into pcie-root-port. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Anton Kuchin --- include/hw/pci/pci.h | 1 + hw/pci/pci.c | 33 + 2 files changed, 34 insertions(+) diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index d5a40cd058..b6c9c44527 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -341,6 +341,7 @@ void pci_for_each_device_under_bus_reverse(PCIBus *bus, void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin, pci_bus_fn end, void *parent_state); PCIDevice *pci_get_function_0(PCIDevice *pci_dev); +PCIDevice *pci_find_the_only_child(PCIBus *bus, int bus_num, Error **errp); /* Use this wrapper when specific scan order is not required. */ static inline diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 208c16f450..34fd1fb5b8 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -1771,6 +1771,39 @@ void pci_for_each_device(PCIBus *bus, int bus_num, } } +typedef struct TheOnlyChild { +PCIDevice *dev; +int count; +} TheOnlyChild; + +static void the_only_child_fn(PCIBus *bus, PCIDevice *dev, void *opaque) +{ +TheOnlyChild *s = opaque; + +s->dev = dev; +s->count++; +} + +PCIDevice *pci_find_the_only_child(PCIBus *bus, int bus_num, Error **errp) +{ +TheOnlyChild res = {0}; + +pci_for_each_device(bus, bus_num, the_only_child_fn, &res); + +if (!res.dev) { +assert(res.count == 0); +error_setg(errp, "No child devices found"); +return NULL; +} + +if (res.count > 1) { +error_setg(errp, "Several child devices found"); +return NULL; +} + +return res.dev; +} + const pci_class_desc *get_class_desc(int class) { const pci_class_desc *desc;
Re: [PATCH v4 12/16] pcie: set power indicator to off on reset by default
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: It should be zero, the only valid values are ON, OFF and BLINK. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Anton Kuchin --- hw/pci/pcie.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 90faf0710a..b8c24cf45f 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -684,6 +684,7 @@ void pcie_cap_slot_reset(PCIDevice *dev) PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE); pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL, + PCI_EXP_SLTCTL_PWR_IND_OFF | PCI_EXP_SLTCTL_ATTN_IND_OFF); if (dev->cap_present & QEMU_PCIE_SLTCAP_PCP) {
Re: [PATCH v4 11/16] pcie: introduce pcie_sltctl_powered_off() helper
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: In pcie_cap_slot_write_config() we check for PCI_EXP_SLTCTL_PWR_OFF in a bad form. We should distinguish PCI_EXP_SLTCTL_PWR which is a "mask" and PCI_EXP_SLTCTL_PWR_OFF which is value for that mask. Better code is in pcie_cap_slot_unplug_request_cb() and in pcie_cap_update_power(). Let's use same pattern everywhere. To simplify things add also a helper. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Anton Kuchin --- hw/pci/pcie.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index db8360226f..90faf0710a 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -39,6 +39,11 @@ #define PCIE_DEV_PRINTF(dev, fmt, ...) \ PCIE_DPRINTF("%s:%x "fmt, (dev)->name, (dev)->devfn, ## __VA_ARGS__) +static bool pcie_sltctl_powered_off(uint16_t sltctl) +{ +return (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_OFF +&& (sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF; +} /*** * pci express capability helper functions @@ -395,6 +400,7 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev) if (sltcap & PCI_EXP_SLTCAP_PCP) { power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON; +/* Don't we need to check also (sltctl & PCI_EXP_SLTCTL_PIC) ? */ } pci_for_each_device(sec_bus, pci_bus_num(sec_bus), @@ -579,8 +585,7 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev, return; } -if (((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF) && -((sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_OFF)) { +if (pcie_sltctl_powered_off(sltctl)) { /* slot is powered off -> unplug without round-trip to the guest */ pcie_cap_slot_do_unplug(hotplug_pdev); hotplug_event_notify(hotplug_pdev); @@ -770,10 +775,9 @@ void pcie_cap_slot_write_config(PCIDevice *dev, * this is a work around for guests that overwrite * control of powered off slots before powering them on. */ -if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) && -(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF && -(!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) || -(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PWR_IND_OFF)) { +if ((sltsta & PCI_EXP_SLTSTA_PDS) && pcie_sltctl_powered_off(val) && +!pcie_sltctl_powered_off(old_slt_ctl)) +{ pcie_cap_slot_do_unplug(dev); } pcie_cap_update_power(dev);
Re: [PATCH v4 10/16] pcie: pcie_cap_slot_enable_power() use correct helper
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: *_by_mask() helpers shouldn't be used here (and that's the only one). *_by_mask() helpers do shift their value argument, but in pcie.c code we use values that are already shifted appropriately. Happily, PCI_EXP_SLTCTL_PWR_ON is zero, so shift doesn't matter. But if we apply same helper for PCI_EXP_SLTCTL_PWR_OFF constant it will do wrong thing. So, let's use instead pci_word_test_and_clear_mask() which is already used in the file to clear PCI_EXP_SLTCTL_PWR_OFF bit in pcie_cap_slot_init() and pcie_cap_slot_reset(). Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Anton Kuchin --- hw/pci/pcie.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index ccdb2377e1..db8360226f 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -373,8 +373,8 @@ void pcie_cap_slot_enable_power(PCIDevice *dev) uint32_t sltcap = pci_get_long(exp_cap + PCI_EXP_SLTCAP); if (sltcap & PCI_EXP_SLTCAP_PCP) { -pci_set_word_by_mask(exp_cap + PCI_EXP_SLTCTL, - PCI_EXP_SLTCTL_PCC, PCI_EXP_SLTCTL_PWR_ON); +pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTCTL, + PCI_EXP_SLTCTL_PCC); } }
Re: [PATCH v4 09/16] pcie: drop unused PCIExpressIndicator
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: The structure type is unused. Also, it's the only user of corresponding macros, so drop them too. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Anton Kuchin --- include/hw/pci/pcie.h | 8 include/hw/pci/pcie_regs.h | 5 - 2 files changed, 13 deletions(-) diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h index 798a262a0a..3cc2b15957 100644 --- a/include/hw/pci/pcie.h +++ b/include/hw/pci/pcie.h @@ -27,14 +27,6 @@ #include "hw/pci/pcie_sriov.h" #include "hw/hotplug.h" -typedef enum { -/* for attention and power indicator */ -PCI_EXP_HP_IND_RESERVED = PCI_EXP_SLTCTL_IND_RESERVED, -PCI_EXP_HP_IND_ON = PCI_EXP_SLTCTL_IND_ON, -PCI_EXP_HP_IND_BLINK= PCI_EXP_SLTCTL_IND_BLINK, -PCI_EXP_HP_IND_OFF = PCI_EXP_SLTCTL_IND_OFF, -} PCIExpressIndicator; - typedef enum { /* these bits must match the bits in Slot Control/Status registers. * PCI_EXP_HP_EV_xxx = PCI_EXP_SLTCTL_xxxE = PCI_EXP_SLTSTA_xxx diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h index 00b595a82e..1fe0bdd25b 100644 --- a/include/hw/pci/pcie_regs.h +++ b/include/hw/pci/pcie_regs.h @@ -66,11 +66,6 @@ typedef enum PCIExpLinkWidth { #define PCI_EXP_SLTCAP_PSN_SHIFTctz32(PCI_EXP_SLTCAP_PSN) -#define PCI_EXP_SLTCTL_IND_RESERVED 0x0 -#define PCI_EXP_SLTCTL_IND_ON 0x1 -#define PCI_EXP_SLTCTL_IND_BLINK0x2 -#define PCI_EXP_SLTCTL_IND_OFF 0x3 - #define PCI_EXP_SLTCTL_SUPPORTED\ (PCI_EXP_SLTCTL_ABPE | \ PCI_EXP_SLTCTL_PDCE | \
Re: [PATCH v4 08/16] pcie_regs: drop duplicated indicator value macros
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: We already have indicator values in include/standard-headers/linux/pci_regs.h , no reason to reinvent them in include/hw/pci/pcie_regs.h. (and we already have usage of PCI_EXP_SLTCTL_PWR_IND_BLINK and PCI_EXP_SLTCTL_PWR_IND_OFF in hw/pci/pcie.c, so let's be consistent) Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Anton Kuchin --- include/hw/pci/pcie_regs.h | 9 - hw/pci/pcie.c | 13 +++-- 2 files changed, 7 insertions(+), 15 deletions(-) diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h index 963dc2e170..00b595a82e 100644 --- a/include/hw/pci/pcie_regs.h +++ b/include/hw/pci/pcie_regs.h @@ -70,15 +70,6 @@ typedef enum PCIExpLinkWidth { #define PCI_EXP_SLTCTL_IND_ON 0x1 #define PCI_EXP_SLTCTL_IND_BLINK0x2 #define PCI_EXP_SLTCTL_IND_OFF 0x3 -#define PCI_EXP_SLTCTL_AIC_SHIFTctz32(PCI_EXP_SLTCTL_AIC) -#define PCI_EXP_SLTCTL_AIC_OFF \ -(PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_AIC_SHIFT) - -#define PCI_EXP_SLTCTL_PIC_SHIFTctz32(PCI_EXP_SLTCTL_PIC) -#define PCI_EXP_SLTCTL_PIC_OFF \ -(PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_PIC_SHIFT) -#define PCI_EXP_SLTCTL_PIC_ON \ -(PCI_EXP_SLTCTL_IND_ON << PCI_EXP_SLTCTL_PIC_SHIFT) #define PCI_EXP_SLTCTL_SUPPORTED\ (PCI_EXP_SLTCTL_ABPE | \ diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 82ef723983..ccdb2377e1 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -634,8 +634,8 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s) PCI_EXP_SLTCTL_PIC | PCI_EXP_SLTCTL_AIC); pci_word_test_and_set_mask(dev->config + pos + PCI_EXP_SLTCTL, - PCI_EXP_SLTCTL_PIC_OFF | - PCI_EXP_SLTCTL_AIC_OFF); + PCI_EXP_SLTCTL_PWR_IND_OFF | + PCI_EXP_SLTCTL_ATTN_IND_OFF); pci_word_test_and_set_mask(dev->wmask + pos + PCI_EXP_SLTCTL, PCI_EXP_SLTCTL_PIC | PCI_EXP_SLTCTL_AIC | @@ -679,7 +679,7 @@ void pcie_cap_slot_reset(PCIDevice *dev) PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE); pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL, - PCI_EXP_SLTCTL_AIC_OFF); + PCI_EXP_SLTCTL_ATTN_IND_OFF); if (dev->cap_present & QEMU_PCIE_SLTCAP_PCP) { /* Downstream ports enforce device number 0. */ @@ -694,7 +694,8 @@ void pcie_cap_slot_reset(PCIDevice *dev) PCI_EXP_SLTCTL_PCC); } -pic = populated ? PCI_EXP_SLTCTL_PIC_ON : PCI_EXP_SLTCTL_PIC_OFF; +pic = populated ? +PCI_EXP_SLTCTL_PWR_IND_ON : PCI_EXP_SLTCTL_PWR_IND_OFF; pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL, pic); } @@ -770,9 +771,9 @@ void pcie_cap_slot_write_config(PCIDevice *dev, * control of powered off slots before powering them on. */ if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) && -(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PIC_OFF && +(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF && (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) || -(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PIC_OFF)) { +(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PWR_IND_OFF)) { pcie_cap_slot_do_unplug(dev); } pcie_cap_update_power(dev);
Re: [PATCH v4 07/16] pcie: pcie_cap_slot_write_config(): use correct macro
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: PCI_EXP_SLTCTL_PIC_OFF is a value, and PCI_EXP_SLTCTL_PIC is a mask. Happily PCI_EXP_SLTCTL_PIC_OFF is a maximum value for this mask and is equal to the mask itself. Still the code looks like a bug. Let's make it more reader-friendly. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Philippe Mathieu-Daudé --- hw/pci/pcie.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 924fdabd15..82ef723983 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -770,9 +770,9 @@ void pcie_cap_slot_write_config(PCIDevice *dev, * control of powered off slots before powering them on. */ if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) && -(val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF && +(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PIC_OFF && (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) || -(old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) { +(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PIC_OFF)) { pcie_cap_slot_do_unplug(dev); } pcie_cap_update_power(dev); Reviewed-by: Anton Kuchin
Re: [PATCH v4 06/16] pci/shpc: refactor shpc_device_plug_common()
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: Rename it to shpc_device_get_slot(), to mention what it does rather than how it is used. It also helps to reuse it in further commit. Also, add a return value and get rid of local_err. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/pci/shpc.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c index 9f964b1d70..e7bc7192f1 100644 --- a/hw/pci/shpc.c +++ b/hw/pci/shpc.c @@ -496,8 +496,9 @@ static const MemoryRegionOps shpc_mmio_ops = { .max_access_size = 4, }, }; -static void shpc_device_plug_common(PCIDevice *affected_dev, int *slot, -SHPCDevice *shpc, Error **errp) + +static bool shpc_device_get_slot(PCIDevice *affected_dev, int *slot, + SHPCDevice *shpc, Error **errp) { int pci_slot = PCI_SLOT(affected_dev->devfn); *slot = SHPC_PCI_TO_IDX(pci_slot); @@ -507,21 +508,20 @@ static void shpc_device_plug_common(PCIDevice *affected_dev, int *slot, "controller. Valid slots are between %d and %d.", pci_slot, SHPC_IDX_TO_PCI(0), SHPC_IDX_TO_PCI(shpc->nslots) - 1); -return; +return false; } + +return true; } void shpc_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { -Error *local_err = NULL; PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev); SHPCDevice *shpc = pci_hotplug_dev->shpc; int slot; -shpc_device_plug_common(PCI_DEVICE(dev), &slot, shpc, &local_err); -if (local_err) { -error_propagate(errp, local_err); +if (!shpc_device_get_slot(PCI_DEVICE(dev), &slot, shpc, errp)) { return; } @@ -563,16 +563,13 @@ void shpc_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, void shpc_device_unplug_request_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { -Error *local_err = NULL; PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev); SHPCDevice *shpc = pci_hotplug_dev->shpc; uint8_t state; uint8_t led; int slot; -shpc_device_plug_common(PCI_DEVICE(dev), &slot, shpc, &local_err); -if (local_err) { -error_propagate(errp, local_err); +if (!shpc_device_get_slot(PCI_DEVICE(dev), &slot, shpc, errp)) { return; } Reviewed-by: Anton Kuchin
Re: [PATCH v4 05/16] pci/shpc: pass PCIDevice pointer to shpc_slot_command()
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: We'll need it in further patch to report bridge in QAPI event. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/pci/shpc.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c index 959dc470f3..9f964b1d70 100644 --- a/hw/pci/shpc.c +++ b/hw/pci/shpc.c @@ -263,9 +263,10 @@ static bool shpc_slot_is_off(uint8_t state, uint8_t power, uint8_t attn) return state == SHPC_STATE_DISABLED && power == SHPC_LED_OFF; } -static void shpc_slot_command(SHPCDevice *shpc, uint8_t target, +static void shpc_slot_command(PCIDevice *d, uint8_t target, uint8_t state, uint8_t power, uint8_t attn) { +SHPCDevice *shpc = d->shpc; int slot = SHPC_LOGICAL_TO_IDX(target); uint8_t old_state = shpc_get_status(shpc, slot, SHPC_SLOT_STATE_MASK); uint8_t old_power = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK); @@ -314,8 +315,9 @@ static void shpc_slot_command(SHPCDevice *shpc, uint8_t target, } } -static void shpc_command(SHPCDevice *shpc) +static void shpc_command(PCIDevice *d) { +SHPCDevice *shpc = d->shpc; uint8_t code = pci_get_byte(shpc->config + SHPC_CMD_CODE); uint8_t speed; uint8_t target; @@ -336,7 +338,7 @@ static void shpc_command(SHPCDevice *shpc) state = (code & SHPC_SLOT_STATE_MASK) >> SHPC_SLOT_STATE_SHIFT; power = (code & SHPC_SLOT_PWR_LED_MASK) >> SHPC_SLOT_PWR_LED_SHIFT; attn = (code & SHPC_SLOT_ATTN_LED_MASK) >> SHPC_SLOT_ATTN_LED_SHIFT; -shpc_slot_command(shpc, target, state, power, attn); +shpc_slot_command(d, target, state, power, attn); break; case 0x40 ... 0x47: speed = code & SHPC_SEC_BUS_MASK; @@ -354,10 +356,10 @@ static void shpc_command(SHPCDevice *shpc) } for (i = 0; i < shpc->nslots; ++i) { if (!(shpc_get_status(shpc, i, SHPC_SLOT_STATUS_MRL_OPEN))) { -shpc_slot_command(shpc, i + SHPC_CMD_TRGT_MIN, +shpc_slot_command(d, i + SHPC_CMD_TRGT_MIN, SHPC_STATE_PWRONLY, SHPC_LED_ON, SHPC_LED_NO); } else { -shpc_slot_command(shpc, i + SHPC_CMD_TRGT_MIN, +shpc_slot_command(d, i + SHPC_CMD_TRGT_MIN, SHPC_STATE_NO, SHPC_LED_OFF, SHPC_LED_NO); } } @@ -375,10 +377,10 @@ static void shpc_command(SHPCDevice *shpc) } for (i = 0; i < shpc->nslots; ++i) { if (!(shpc_get_status(shpc, i, SHPC_SLOT_STATUS_MRL_OPEN))) { -shpc_slot_command(shpc, i + SHPC_CMD_TRGT_MIN, +shpc_slot_command(d, i + SHPC_CMD_TRGT_MIN, SHPC_STATE_ENABLED, SHPC_LED_ON, SHPC_LED_NO); } else { -shpc_slot_command(shpc, i + SHPC_CMD_TRGT_MIN, +shpc_slot_command(d, i + SHPC_CMD_TRGT_MIN, SHPC_STATE_NO, SHPC_LED_OFF, SHPC_LED_NO); } } @@ -410,7 +412,7 @@ static void shpc_write(PCIDevice *d, unsigned addr, uint64_t val, int l) shpc->config[a] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ } if (ranges_overlap(addr, l, SHPC_CMD_CODE, 2)) { -shpc_command(shpc); +shpc_command(d); } shpc_interrupt_update(d); } Reviewed-by: Anton Kuchin
Re: [PATCH v4 04/16] pci/shpc: more generic handle hot-unplug in shpc_slot_command()
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: Free slot if both conditions (power-led = OFF and state = DISABLED) becomes true regardless of the sequence. It is similar to how PCIe hotplug works. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/pci/shpc.c | 52 ++- 1 file changed, 31 insertions(+), 21 deletions(-) diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c index 25e4172382..959dc470f3 100644 --- a/hw/pci/shpc.c +++ b/hw/pci/shpc.c @@ -258,49 +258,59 @@ static void shpc_free_devices_in_slot(SHPCDevice *shpc, int slot) } } +static bool shpc_slot_is_off(uint8_t state, uint8_t power, uint8_t attn) +{ +return state == SHPC_STATE_DISABLED && power == SHPC_LED_OFF; +} + static void shpc_slot_command(SHPCDevice *shpc, uint8_t target, uint8_t state, uint8_t power, uint8_t attn) { -uint8_t current_state; int slot = SHPC_LOGICAL_TO_IDX(target); +uint8_t old_state = shpc_get_status(shpc, slot, SHPC_SLOT_STATE_MASK); +uint8_t old_power = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK); +uint8_t old_attn = shpc_get_status(shpc, slot, SHPC_SLOT_ATTN_LED_MASK); + if (target < SHPC_CMD_TRGT_MIN || slot >= shpc->nslots) { shpc_invalid_command(shpc); return; } -current_state = shpc_get_status(shpc, slot, SHPC_SLOT_STATE_MASK); -if (current_state == SHPC_STATE_ENABLED && state == SHPC_STATE_PWRONLY) { + +if (old_state == SHPC_STATE_ENABLED && state == SHPC_STATE_PWRONLY) { shpc_invalid_command(shpc); return; } -if (power != SHPC_LED_NO) { +if (power == SHPC_LED_NO) { +power = old_power; +} else { /* TODO: send event to monitor */ shpc_set_status(shpc, slot, power, SHPC_SLOT_PWR_LED_MASK); } -if (attn != SHPC_LED_NO) { + +if (attn == SHPC_LED_NO) { +attn = old_attn; +} else { /* TODO: send event to monitor */ shpc_set_status(shpc, slot, attn, SHPC_SLOT_ATTN_LED_MASK); } -if (state != SHPC_STATE_NO) { + +if (state == SHPC_STATE_NO) { +state = old_state; +} else { shpc_set_status(shpc, slot, state, SHPC_SLOT_STATE_MASK); } -if ((current_state == SHPC_STATE_ENABLED || - current_state == SHPC_STATE_PWRONLY) && -state == SHPC_STATE_DISABLED) +if (!shpc_slot_is_off(old_state, old_power, old_attn) && +shpc_slot_is_off(state, power, attn)) { -power = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK); -/* TODO: track what monitor requested. */ -/* Look at LED to figure out whether it's ok to remove the device. */ -if (power == SHPC_LED_OFF) { -shpc_free_devices_in_slot(shpc, slot); -shpc_set_status(shpc, slot, 1, SHPC_SLOT_STATUS_MRL_OPEN); -shpc_set_status(shpc, slot, SHPC_SLOT_STATUS_PRSNT_EMPTY, -SHPC_SLOT_STATUS_PRSNT_MASK); -shpc->config[SHPC_SLOT_EVENT_LATCH(slot)] |= -SHPC_SLOT_EVENT_MRL | -SHPC_SLOT_EVENT_PRESENCE; -} +shpc_free_devices_in_slot(shpc, slot); +shpc_set_status(shpc, slot, 1, SHPC_SLOT_STATUS_MRL_OPEN); +shpc_set_status(shpc, slot, SHPC_SLOT_STATUS_PRSNT_EMPTY, +SHPC_SLOT_STATUS_PRSNT_MASK); +shpc->config[SHPC_SLOT_EVENT_LATCH(slot)] |= +SHPC_SLOT_EVENT_MRL | + SHPC_SLOT_EVENT_PRESENCE; } } Reviewed-by: Anton Kuchin
Re: [PATCH v4 03/16] pci/shpc: shpc_slot_command(): handle PWRONLY -> ENABLED transition
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: ENABLED -> PWRONLY transition is not allowed and we handle it by shpc_invalid_command(). But PWRONLY -> ENABLED transition is silently ignored, which seems wrong. Let's handle it as correct. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/pci/shpc.c | 24 +--- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c index 5d71569b13..25e4172382 100644 --- a/hw/pci/shpc.c +++ b/hw/pci/shpc.c @@ -273,28 +273,22 @@ static void shpc_slot_command(SHPCDevice *shpc, uint8_t target, return; } -switch (power) { -case SHPC_LED_NO: -break; -default: +if (power != SHPC_LED_NO) { /* TODO: send event to monitor */ shpc_set_status(shpc, slot, power, SHPC_SLOT_PWR_LED_MASK); } -switch (attn) { -case SHPC_LED_NO: -break; -default: +if (attn != SHPC_LED_NO) { /* TODO: send event to monitor */ shpc_set_status(shpc, slot, attn, SHPC_SLOT_ATTN_LED_MASK); } - -if ((current_state == SHPC_STATE_DISABLED && state == SHPC_STATE_PWRONLY) || -(current_state == SHPC_STATE_DISABLED && state == SHPC_STATE_ENABLED)) { -shpc_set_status(shpc, slot, state, SHPC_SLOT_STATE_MASK); -} else if ((current_state == SHPC_STATE_ENABLED || -current_state == SHPC_STATE_PWRONLY) && - state == SHPC_STATE_DISABLED) { +if (state != SHPC_STATE_NO) { shpc_set_status(shpc, slot, state, SHPC_SLOT_STATE_MASK); +} + +if ((current_state == SHPC_STATE_ENABLED || + current_state == SHPC_STATE_PWRONLY) && +state == SHPC_STATE_DISABLED) +{ power = shpc_get_status(shpc, slot, SHPC_SLOT_PWR_LED_MASK); /* TODO: track what monitor requested. */ /* Look at LED to figure out whether it's ok to remove the device. */ Reviewed-by: Anton Kuchin
Re: [PATCH v4 02/16] pci/shpc: change shpc_get_status() return type to uint8_t
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: The result of the function is always one byte. The result is always assigned to uint8_t variable. Also, shpc_get_status() should be symmetric to shpc_set_status() which has uint8_t value argument. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/pci/shpc.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c index 1b3f619dc9..5d71569b13 100644 --- a/hw/pci/shpc.c +++ b/hw/pci/shpc.c @@ -123,10 +123,13 @@ #define SHPC_PCI_TO_IDX(pci_slot) ((pci_slot) - 1) #define SHPC_IDX_TO_PHYSICAL(slot) ((slot) + 1) -static uint16_t shpc_get_status(SHPCDevice *shpc, int slot, uint16_t msk) +static uint8_t shpc_get_status(SHPCDevice *shpc, int slot, uint16_t msk) { uint8_t *status = shpc->config + SHPC_SLOT_STATUS(slot); -return (pci_get_word(status) & msk) >> ctz32(msk); +uint16_t result = (pci_get_word(status) & msk) >> ctz32(msk); + +assert(result <= UINT8_MAX); +return result; } static void shpc_set_status(SHPCDevice *shpc, Reviewed-by: Anton Kuchin
Re: [PATCH v4 01/16] pci/shpc: set attention led to OFF on reset
On 13/02/2023 16:00, Vladimir Sementsov-Ogievskiy wrote: 0 is not a valid state for the led. Let's start with OFF. Signed-off-by: Vladimir Sementsov-Ogievskiy --- hw/pci/shpc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c index fca7f6691a..1b3f619dc9 100644 --- a/hw/pci/shpc.c +++ b/hw/pci/shpc.c @@ -223,6 +223,7 @@ void shpc_reset(PCIDevice *d) SHPC_SLOT_STATUS_PRSNT_MASK); shpc_set_status(shpc, i, SHPC_LED_OFF, SHPC_SLOT_PWR_LED_MASK); } +shpc_set_status(shpc, i, SHPC_LED_OFF, SHPC_SLOT_ATTN_LED_MASK); shpc_set_status(shpc, i, 0, SHPC_SLOT_STATUS_66); } shpc_set_sec_bus_speed(shpc, SHPC_SEC_BUS_33); Reviewed-by: Anton Kuchin
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 02/02/2023 11:59, Juan Quintela wrote: Anton Kuchin wrote: On 01/02/2023 16:26, Juan Quintela wrote: Anton Kuchin wrote: On 19/01/2023 18:02, Stefan Hajnoczi wrote: On Thu, 19 Jan 2023 at 10:29, Anton Kuchin wrote: On 19/01/2023 16:30, Stefan Hajnoczi wrote: On Thu, 19 Jan 2023 at 07:43, Anton Kuchin wrote: On 18/01/2023 17:52, Stefan Hajnoczi wrote: On Sun, 15 Jan 2023 at 12:21, Anton Kuchin wrote: Once told that, I think that you are making your live harder in the future when you add the other migratable devices. I am assuming here that your "underlying device" is: enum VhostUserFSType { VHOST_USER_NO_MIGRATABLE = 0, // The one we are doing here VHOST_USER_EXTERNAL_MIGRATABLE, // The one you describe for the future VHOST_USER_INTERNAL_MIGRATABLE, }; struct VHostUserFS { /*< private >*/ VirtIODevice parent; VHostUserFSConf conf; struct vhost_virtqueue *vhost_vqs; struct vhost_dev vhost_dev; VhostUserState vhost_user; VirtQueue **req_vqs; VirtQueue *hiprio_vq; int32_t bootindex; enum migration_type; /*< public >*/ }; static int vhost_user_fs_pre_save(void *opaque) { VHostUserFS *s = opaque; if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) { error_report("Migration of vhost-user-fs devices requires internal FUSE " "state of backend to be preserved. If orchestrator can " "guarantee this (e.g. dst connects to the same backend " "instance or backend state is migrated) set 'vhost-user-fs' " "migration capability to true to enable migration."); return -1; } if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) { return 0; } if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) { error_report("still not implemented"); return -1; } assert("we don't reach here"); } Your initial vmstateDescription static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", .unmigratable = 1, .minimum_version_id = 0, .version_id = 0, .fields = (VMStateField[]) { VMSTATE_INT8(migration_type, struct VHostUserFS), VMSTATE_VIRTIO_DEVICE, VMSTATE_END_OF_LIST() }, .pre_save = vhost_user_fs_pre_save, }; And later you change it to something like: static bool vhost_fs_user_internal_state_needed(void *opaque) { VHostUserFS *s = opaque; return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE; } static const VMStateDescription vmstate_vhost_user_fs_internal_sub = { .name = "vhost-user-fs/internal", .version_id = 1, .minimum_version_id = 1, .needed = &vhost_fs_user_internal_state_needed, .fields = (VMStateField[]) { // Whatever VMSTATE_END_OF_LIST() } }; static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", .minimum_version_id = 0, .version_id = 0, .fields = (VMStateField[]) { VMSTATE_INT8(migration_type, struct VHostUserFS), VMSTATE_VIRTIO_DEVICE, VMSTATE_END_OF_LIST() }, .pre_save = vhost_user_fs_pre_save, .subsections = (const VMStateDescription*[]) { &vmstate_vhost_user_fs_internal_sub, NULL } }; And you are done. I will propose to use a property to set migration_type, but I didn't want to write the code right now. I have a little problem with implementation here and need an advice: Can we make this property adjustable at runtime after device was realized? There is a statement in qemu wiki [1] that makes me think this is possible but I couldn't find any code for it or example in other devices. > "Properties are, by default, locked while the device is realized. Exceptions > can be made by the devices themselves in order to implement a way for a user > to interact with a device while it is realized." Or is your idea just to set this property once at construction and keep it constant for device lifetime? [1] https://wiki.qemu.org/Features/QOM I think that this proposal will make Stephan happy, and it is just adding and extra uint8_t that is helpul to implement everything. That is exactly the approach I'm trying to implement it right now. Single flag can be convenient for orchestrator but makes it too hard to account in all cases for all devices on qemu side without breaking future compatibility. So I'm rewriting it with properties. Nice. That would be my proposal. Just a bit complicated for a proof of concetp. BTW do you think each vhost-user device should have its own enum of migration types or maybe we could make them common for all
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 01/02/2023 16:26, Juan Quintela wrote: Anton Kuchin wrote: On 19/01/2023 18:02, Stefan Hajnoczi wrote: On Thu, 19 Jan 2023 at 10:29, Anton Kuchin wrote: On 19/01/2023 16:30, Stefan Hajnoczi wrote: On Thu, 19 Jan 2023 at 07:43, Anton Kuchin wrote: On 18/01/2023 17:52, Stefan Hajnoczi wrote: On Sun, 15 Jan 2023 at 12:21, Anton Kuchin wrote: Hi Sorry to come so late into the discussion. You are just in time, I'm working on v2 of this patch right now :-) +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; I don't object to extend the vmstate this way. But I object to the migration capability for several reasons: - This feature has _nothing_ to do with migration, the problem, what it describes, etc is related to vhost_user_fs. - The number of migration capabilities is limited And we add checks to see if they are valid, consistent, etc see qemu/migration/migration.c:migrate_caps_check() - As Stefan says, we can have several vhost_user_fs devices, and each one should know if it can migrate or not. - We have to options for the orchestator: * it knows that all the vhost_user_fs devices can be migration Then it can add a property to each vhost_user_fs device * it don't know it Then it is a good idea that we are not migrating this VM. I think we'd be better without a new marker because migration code has standard generic way of solving such puzzles that I described above. So adding new marker would go against existing practice. But if you could show me where I missed something I'll be grateful and will fix it to avoid potential problems. I'd also be happy to know the opinion of Dr. David Alan Gilbert. If everybody agrees that any vhost_user_fs device is going to have a virtio device, then I agree with you that the marker is not needed at this point. Once told that, I think that you are making your live harder in the future when you add the other migratable devices. I am assuming here that your "underlying device" is: enum VhostUserFSType { VHOST_USER_NO_MIGRATABLE = 0, // The one we are doing here VHOST_USER_EXTERNAL_MIGRATABLE, // The one you describe for the future VHOST_USER_INTERNAL_MIGRATABLE, }; struct VHostUserFS { /*< private >*/ VirtIODevice parent; VHostUserFSConf conf; struct vhost_virtqueue *vhost_vqs; struct vhost_dev vhost_dev; VhostUserState vhost_user; VirtQueue **req_vqs; VirtQueue *hiprio_vq; int32_t bootindex; enum migration_type; /*< public >*/ }; static int vhost_user_fs_pre_save(void *opaque) { VHostUserFS *s = opaque; if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) { error_report("Migration of vhost-user-fs devices requires internal FUSE " "state of backend to be preserved. If orchestrator can " "guarantee this (e.g. dst connects to the same backend " "instance or backend state is migrated) set 'vhost-user-fs' " "migration capability to true to enable migration."); return -1; } if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) { return 0; } if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) { error_report("still not implemented"); return -1; } assert("we don't reach here"); } Your initial vmstateDescription static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", .unmigratable = 1, .minimum_version_id = 0, .version_id = 0, .fields = (VMStateField[]) { VMSTATE_INT8(migration_type, struct VHostUserFS), VMSTATE_VIRTIO_DEVICE, VMSTATE_END_OF_LIST() }, .pre_save = vhost_user_fs_pre_save, }; And later you change it to something like: static bool vhost_fs_user_internal_state_
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 26/01/2023 17:13, Stefan Hajnoczi wrote: On Thu, 26 Jan 2023 at 09:20, Anton Kuchin wrote: On 25/01/2023 21:46, Stefan Hajnoczi wrote: On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) Hi Anton, Sorry for holding up your work with the discussions that we had. I still feel it's important to agree on command-line and/or vhost-user protocol changes that will be able to support non-migratable, stateless migration/reconnect, and stateful migration vhost-user-fs back-ends. All three will exist. As a next step, could you share your code that implements the QEMU side of stateless migration? I think that will make it clearer whether a command-line option (migration capability or per-device) is sufficient or whether the vhost-user protocol needs to be extended. If the vhost-user protocol is extended then maybe no user-visible changes are necessary. QEMU will know if the vhost-user-fs backend supports migration and which type of migration. It can block migration in cases where it's not possible. Thanks, Stefan Thank you, Stefan, That's OK. The discussion is very helpful and showed me some parts that should to be checked to make sure no harm is done by this feature. I needed some time to step back, review my approach to this feature with all valuable feedback and ideas that were suggested and check what other backend implementations can or can't do. I'll answer today the emails with questions that were addressed to me. This is all the code that QEMU needs to support stateless migration. Do you mean backend and/or orchestrator parts? It's unclear to me how the destination QEMU is able to connect to the vhost-user back-end while the source QEMU is still connected? I thought additional QEMU changes would be necessary to make migration handover work. Stefan Oh, yes, that is exactly the part I'm checking right now: I was testing the scenario when vm is saved to file and then restored from file without host and destination running at the same time.
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 25/01/2023 21:46, Stefan Hajnoczi wrote: On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) Hi Anton, Sorry for holding up your work with the discussions that we had. I still feel it's important to agree on command-line and/or vhost-user protocol changes that will be able to support non-migratable, stateless migration/reconnect, and stateful migration vhost-user-fs back-ends. All three will exist. As a next step, could you share your code that implements the QEMU side of stateless migration? I think that will make it clearer whether a command-line option (migration capability or per-device) is sufficient or whether the vhost-user protocol needs to be extended. If the vhost-user protocol is extended then maybe no user-visible changes are necessary. QEMU will know if the vhost-user-fs backend supports migration and which type of migration. It can block migration in cases where it's not possible. Thanks, Stefan Thank you, Stefan, That's OK. The discussion is very helpful and showed me some parts that should to be checked to make sure no harm is done by this feature. I needed some time to step back, review my approach to this feature with all valuable feedback and ideas that were suggested and check what other backend implementations can or can't do. I'll answer today the emails with questions that were addressed to me. This is all the code that QEMU needs to support stateless migration. Do you mean backend and/or orchestrator parts? I'll think of how protocol extension can help us make this feature transparent to users.
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 23/01/2023 16:09, Stefan Hajnoczi wrote: On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin wrote: On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: On 22/01/2023 16:46, Michael S. Tsirkin wrote: On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: This flag should be set when qemu don't need to worry about any external state stored in vhost-user daemons during migration: don't fail migration, just pack generic virtio device states to migration stream and orchestrator guarantees that the rest of the state will be present at the destination to restore full context and continue running. Sorry I still do not get it. So fundamentally, why do we need this property? vhost-user-fs is not created by default that we'd then need opt-in to the special "migrateable" case. That's why I said it might make some sense as a device property as qemu tracks whether device is unplugged for us. But as written, if you are going to teach the orchestrator about vhost-user-fs and its special needs, just teach it when to migrate and where not to migrate. Either we describe the special situation to qemu and let qemu make an intelligent decision whether to allow migration, or we trust the orchestrator. And if it's the latter, then 'migrate' already says orchestrator decided to migrate. The problem I'm trying to solve is that most of vhost-user devices now block migration in qemu. And this makes sense since qemu can't extract and transfer backend daemon state. But this prevents us from updating qemu executable via local migration. So this flag is intended more as a safety check that says "I know what I'm doing". I agree that it is not really necessary if we trust the orchestrator to request migration only when the migration can be performed in a safe way. But changing the current behavior of vhost-user-fs from "always blocks migration" to "migrates partial state whenever orchestrator requests it" seems a little dangerous and can be misinterpreted as full support for migration in all cases. It's not really different from block is it? orchestrator has to arrange for backend migration. I think we just assumed there's no use-case where this is practical for vhost-user-fs so we blocked it. But in any case it's orchestrator's responsibility. Yes, you are right. So do you think we should just drop the blocker without adding a new flag? I'd be inclined to. I am curious what do dgilbert and stefanha think though. If the migration blocker is removed, what happens when a user attempts to migrate with a management tool and/or vhost-user-fs server implementation that don't support migration? There will be no matching fuse-session in destination endpoint so all requests to this fs will fail until it is remounted from guest to send new FUSE_INIT message that does session setup. Anton: Can you explain how stateless migration will work on the vhost-user-fs back-end side? Is it reusing vhost-user reconnect functionality or introducing a new mode for stateless migration? I guess the vhost-user-fs back-end implementation is required to implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all in-flight requests when vrings are stopped? It reuses existing vhost-user reconnect code to resubmit inflight requests. Sure, backend needs to support this feature - presence of required features is checked by generic vhost and vhost-user code during init and if something is missing migration blocker is assigned to the device (not a static one in vmstate that I remove in this patch, but other per-device kind of blocker). Stefan
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 22/01/2023 16:46, Michael S. Tsirkin wrote: On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: This flag should be set when qemu don't need to worry about any external state stored in vhost-user daemons during migration: don't fail migration, just pack generic virtio device states to migration stream and orchestrator guarantees that the rest of the state will be present at the destination to restore full context and continue running. Sorry I still do not get it. So fundamentally, why do we need this property? vhost-user-fs is not created by default that we'd then need opt-in to the special "migrateable" case. That's why I said it might make some sense as a device property as qemu tracks whether device is unplugged for us. But as written, if you are going to teach the orchestrator about vhost-user-fs and its special needs, just teach it when to migrate and where not to migrate. Either we describe the special situation to qemu and let qemu make an intelligent decision whether to allow migration, or we trust the orchestrator. And if it's the latter, then 'migrate' already says orchestrator decided to migrate. The problem I'm trying to solve is that most of vhost-user devices now block migration in qemu. And this makes sense since qemu can't extract and transfer backend daemon state. But this prevents us from updating qemu executable via local migration. So this flag is intended more as a safety check that says "I know what I'm doing". I agree that it is not really necessary if we trust the orchestrator to request migration only when the migration can be performed in a safe way. But changing the current behavior of vhost-user-fs from "always blocks migration" to "migrates partial state whenever orchestrator requests it" seems a little dangerous and can be misinterpreted as full support for migration in all cases. It's not really different from block is it? orchestrator has to arrange for backend migration. I think we just assumed there's no use-case where this is practical for vhost-user-fs so we blocked it. But in any case it's orchestrator's responsibility. Yes, you are right. So do you think we should just drop the blocker without adding a new flag?
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 22/01/2023 10:16, Michael S. Tsirkin wrote: On Fri, Jan 20, 2023 at 07:37:18PM +0200, Anton Kuchin wrote: On 20/01/2023 15:58, Michael S. Tsirkin wrote: On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote: On 19/01/2023 14:51, Michael S. Tsirkin wrote: On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); Isn't it possible that some backends are same and some are not? Shouldn't this be a device property then? If some are not the same it is not guaranteed that correct FUSE state is present there, so orchestrator shouldn't set the capability because this can result in destination devices being broken (they'll be fine after the remount in guest, but this is guest visible and is not acceptable). I can imagine smart orchestrator and backend that can transfer internal FUSE state, but we are not there yet, and this would be their responsibility then to ensure endpoint compatibility between src and dst and set the capability (that's why I put "e.g." and "or" in the error description). So instead of relying on the orchestrator how about making it a device property? We have to rely on the orchestrator here and I can't see how a property can help us with this: same device can be migratable or not depending on if the destination is the same host, what features backend supports, how management software works and other factors of environment that are not accessible from qemu or backend daemon. So in the end we'll end up with orchestrator having to setup flags for each device before each migration based on information only it can have - in my opinion this is worse than just giving the orchestrator a single flag that it can set after calculating the decision for the particular migration that it organizes. +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { diff --git a/qapi/migration.json b/qapi/migration.json index 88ecf86ac8..9a229ea884 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -477,6 +477,11 @@ #will be handled faster. This is a performance feature and #should not affect the correctness of postcopy migration. #(since 7.1) +# @vhost-user-fs: If enabled, the migration process will allow migration of +# vhost-user-fs devices, this should be enabled only when +# backend can preserve local FUSE state e.g. for qemu update +# when dst reconects to the same endpoints after migration. +# (since 8.0) # # Features: # @unstable: Members @x-colo and @x-ignore-shar
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 20/01/2023 15:58, Michael S. Tsirkin wrote: On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote: On 19/01/2023 14:51, Michael S. Tsirkin wrote: On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); Isn't it possible that some backends are same and some are not? Shouldn't this be a device property then? If some are not the same it is not guaranteed that correct FUSE state is present there, so orchestrator shouldn't set the capability because this can result in destination devices being broken (they'll be fine after the remount in guest, but this is guest visible and is not acceptable). I can imagine smart orchestrator and backend that can transfer internal FUSE state, but we are not there yet, and this would be their responsibility then to ensure endpoint compatibility between src and dst and set the capability (that's why I put "e.g." and "or" in the error description). So instead of relying on the orchestrator how about making it a device property? We have to rely on the orchestrator here and I can't see how a property can help us with this: same device can be migratable or not depending on if the destination is the same host, what features backend supports, how management software works and other factors of environment that are not accessible from qemu or backend daemon. So in the end we'll end up with orchestrator having to setup flags for each device before each migration based on information only it can have - in my opinion this is worse than just giving the orchestrator a single flag that it can set after calculating the decision for the particular migration that it organizes. +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { diff --git a/qapi/migration.json b/qapi/migration.json index 88ecf86ac8..9a229ea884 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -477,6 +477,11 @@ #will be handled faster. This is a performance feature and #should not affect the correctness of postcopy migration. #(since 7.1) +# @vhost-user-fs: If enabled, the migration process will allow migration of +# vhost-user-fs devices, this should be enabled only when +# backend can preserve local FUSE state e.g. for qemu update +# when dst reconects to the same endpoints after migration. +# (since 8.0) # # Features: # @unstable: Members @x-colo and @x-ignore-shared are experimental. @@ -492,7 +497,7 @@ 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activat
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 19/01/2023 21:00, Dr. David Alan Gilbert wrote: * Anton Kuchin (antonkuc...@yandex-team.ru) wrote: On 19/01/2023 14:51, Michael S. Tsirkin wrote: On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); Isn't it possible that some backends are same and some are not? Shouldn't this be a device property then? If some are not the same it is not guaranteed that correct FUSE state is present there, so orchestrator shouldn't set the capability because this can result in destination devices being broken (they'll be fine after the remount in guest, but this is guest visible and is not acceptable). Shouldn't this be something that's negotiated as a feature bit in the vhost-user protocol - i.e. to say whether the device can do migration? However, I think what you're saying is that a migration might only be doable in some cases - e.g. a local migration - and that's hard for either qemu or the daemon to discover by itself; so it's right that an orchestrator enables it. (I'm not sure the error_report message needs to be quite so verbose - normally a one line clear message is enough that people can then look up). Exactly, migration depends not only on device capabilities but also on backends, hosts, management and other environment not known to qemu or backend so I can't see how this can be device config or negotiated via the protocol. In the message I tried to make clear that just enabling the capability without proper setup can be dangerous to reduce temptation to use it recklessly and get broken virtiofs device later. I'll try to shorten the message. I would appreciate if you could help me with the wording (I'm not a native speaker so building a short sentence with proper level of warning is challenging) I can imagine smart orchestrator and backend that can transfer internal FUSE state, but we are not there yet, and this would be their responsibility then to ensure endpoint compatibility between src and dst and set the capability (that's why I put "e.g." and "or" in the error description). You also need the vhost-user device to be able to do the dirty bitmap updates; is that being checked somewhere? Dave Yes, this is done by generic vhost and vhost-user code on device init. In vhost_user_backend_init function if backend doesn't support VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature migration blocker is assigned to device. And in vhost_dev_init there is one more check for VHOST_F_LOG_ALL feature that assigns another migration blocker if this is not supported. +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_propertie
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 19/01/2023 18:02, Stefan Hajnoczi wrote: On Thu, 19 Jan 2023 at 10:29, Anton Kuchin wrote: On 19/01/2023 16:30, Stefan Hajnoczi wrote: On Thu, 19 Jan 2023 at 07:43, Anton Kuchin wrote: On 18/01/2023 17:52, Stefan Hajnoczi wrote: On Sun, 15 Jan 2023 at 12:21, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; Will it be possible to extend this vmstate when virtiofsd adds support for stateful migration without breaking migration compatibility? If not, then I think a marker field should be added to the vmstate: 0 - stateless/reconnect migration (the approach you're adding in this patch) 1 - stateful migration (future virtiofsd feature) When the field is 0 there are no further vmstate fields and we trust that the destination vhost-user-fs server already has the necessary state. When the field is 1 there are additional vmstate fields that contain the virtiofsd state. The goal is for QEMU to support 3 migration modes, depending on the vhost-user-fs server: 1. No migration support. 2. Stateless migration. 3. Stateful migration. Sure. These vmstate fields are very generic and mandatory for any virtio device. If in future more state can be transfer in migration stream the vmstate can be extended with additional fields. This can be done with new subsections and/or bumping version_id. My concern is that the vmstate introduced in this patch may be unusable when stateful migration is added. So additional compatibility code will need to be introduced to make your stateless migration continue working with extended vmstate. By adding a marker field in this patch it should be possible to continue using the same vmstate for stateless migration without adding extra compatibility code in the future. I understand, but this fields in vmstate just packs generic virtio device state that is accessible by qemu. All additional data could be added later by extra fields. I believe we couldn't pull off any type of virtio device migration without transferring virtqueues so more sophisticated types of migration would require adding more data and not modification to this part of vmstate. What I'm saying is that your patch could define the vmstate such that it that contains a field to differentiate between stateless and stateful migration. That way QEMU versions that only support stateless migration (this patch) will be able to migrate to future QEMU versions that support both stateless and stateful without compatibility issues. I double-checked migration documentation for subsections at https://www.qemu.org/docs/master/devel/migration.html#subsections and believe it perfectly describes our case: virtio device state
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 19/01/2023 16:30, Stefan Hajnoczi wrote: On Thu, 19 Jan 2023 at 07:43, Anton Kuchin wrote: On 18/01/2023 17:52, Stefan Hajnoczi wrote: On Sun, 15 Jan 2023 at 12:21, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; Will it be possible to extend this vmstate when virtiofsd adds support for stateful migration without breaking migration compatibility? If not, then I think a marker field should be added to the vmstate: 0 - stateless/reconnect migration (the approach you're adding in this patch) 1 - stateful migration (future virtiofsd feature) When the field is 0 there are no further vmstate fields and we trust that the destination vhost-user-fs server already has the necessary state. When the field is 1 there are additional vmstate fields that contain the virtiofsd state. The goal is for QEMU to support 3 migration modes, depending on the vhost-user-fs server: 1. No migration support. 2. Stateless migration. 3. Stateful migration. Sure. These vmstate fields are very generic and mandatory for any virtio device. If in future more state can be transfer in migration stream the vmstate can be extended with additional fields. This can be done with new subsections and/or bumping version_id. My concern is that the vmstate introduced in this patch may be unusable when stateful migration is added. So additional compatibility code will need to be introduced to make your stateless migration continue working with extended vmstate. By adding a marker field in this patch it should be possible to continue using the same vmstate for stateless migration without adding extra compatibility code in the future. I understand, but this fields in vmstate just packs generic virtio device state that is accessible by qemu. All additional data could be added later by extra fields. I believe we couldn't pull off any type of virtio device migration without transferring virtqueues so more sophisticated types of migration would require adding more data and not modification to this part of vmstate. The main purpose of this patch is to allow update VM to newer version of qemu via local migration without disruption to guest. And future versions hopefully could pack more state from external environment to migration stream. static Property vuf_properties[] = { diff --git a/qapi/migration.json b/qapi/migration.json index 88ecf86ac8..9a229ea884 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -477,6 +477,11 @@ #will be handled faster. This is a performance feature and #should not affect the correctness of postcopy migration. #(since 7.1) +# @vhost-u
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 19/01/2023 14:51, Michael S. Tsirkin wrote: On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); Isn't it possible that some backends are same and some are not? Shouldn't this be a device property then? If some are not the same it is not guaranteed that correct FUSE state is present there, so orchestrator shouldn't set the capability because this can result in destination devices being broken (they'll be fine after the remount in guest, but this is guest visible and is not acceptable). I can imagine smart orchestrator and backend that can transfer internal FUSE state, but we are not there yet, and this would be their responsibility then to ensure endpoint compatibility between src and dst and set the capability (that's why I put "e.g." and "or" in the error description). +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { diff --git a/qapi/migration.json b/qapi/migration.json index 88ecf86ac8..9a229ea884 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -477,6 +477,11 @@ #will be handled faster. This is a performance feature and #should not affect the correctness of postcopy migration. #(since 7.1) +# @vhost-user-fs: If enabled, the migration process will allow migration of +# vhost-user-fs devices, this should be enabled only when +# backend can preserve local FUSE state e.g. for qemu update +# when dst reconects to the same endpoints after migration. +# (since 8.0) # # Features: # @unstable: Members @x-colo and @x-ignore-shared are experimental. @@ -492,7 +497,7 @@ 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, 'validate-uuid', 'background-snapshot', - 'zero-copy-send', 'postcopy-preempt'] } + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } I kind of dislike that it's such a specific flag. Is only vhost-user-fs ever going to be affected? Any way to put it in a way that is more generic? Here I agree with you: I would prefer less narrow naming too. But I didn't manage to come up with one. Looks like many other vhost-user devices could benefit from this so maybe "vhost-user-stateless" or something like this would be better. I'm not sure that other types of devices could handle reconnect to the old endpoint as easy as vhost-user-fs, but anyway the support for this flag needs to be implemented for each device individually. What do you think? Any ideas would be appreciated. ## # @MigrationCapabilityStatus: -- 2.34.1
Re: [PATCH] vhost-user-fs: add capability to allow migration
On 18/01/2023 17:52, Stefan Hajnoczi wrote: On Sun, 15 Jan 2023 at 12:21, Anton Kuchin wrote: Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; Will it be possible to extend this vmstate when virtiofsd adds support for stateful migration without breaking migration compatibility? If not, then I think a marker field should be added to the vmstate: 0 - stateless/reconnect migration (the approach you're adding in this patch) 1 - stateful migration (future virtiofsd feature) When the field is 0 there are no further vmstate fields and we trust that the destination vhost-user-fs server already has the necessary state. When the field is 1 there are additional vmstate fields that contain the virtiofsd state. The goal is for QEMU to support 3 migration modes, depending on the vhost-user-fs server: 1. No migration support. 2. Stateless migration. 3. Stateful migration. Sure. These vmstate fields are very generic and mandatory for any virtio device. If in future more state can be transfer in migration stream the vmstate can be extended with additional fields. This can be done with new subsections and/or bumping version_id. The main purpose of this patch is to allow update VM to newer version of qemu via local migration without disruption to guest. And future versions hopefully could pack more state from external environment to migration stream. static Property vuf_properties[] = { diff --git a/qapi/migration.json b/qapi/migration.json index 88ecf86ac8..9a229ea884 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -477,6 +477,11 @@ #will be handled faster. This is a performance feature and #should not affect the correctness of postcopy migration. #(since 7.1) +# @vhost-user-fs: If enabled, the migration process will allow migration of +# vhost-user-fs devices, this should be enabled only when +# backend can preserve local FUSE state e.g. for qemu update +# when dst reconects to the same endpoints after migration. +# (since 8.0) This is global but a guest can have multiple vhost-user-fs devices connected to different servers. AFAIK vhost-user requires unix socket and memory shared from guest so devices can't be connected to different servers, just to different endpoints on current host. I would add a qdev property to the device instead of introducing a migration capability. The property would enable "stateless migration". When the property is not set, migration would be prohibited. I did thought about that, but this is really not a property of device, this is the capability of management software an
[PATCH] vhost-user-fs: add capability to allow migration
Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 25 - qapi/migration.json | 7 ++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ +MigrationState *s = migrate_get_current(); + +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { +error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); +return -1; +} + +return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", -.unmigratable = 1, +.minimum_version_id = 0, +.version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_VIRTIO_DEVICE, +VMSTATE_END_OF_LIST() +}, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { diff --git a/qapi/migration.json b/qapi/migration.json index 88ecf86ac8..9a229ea884 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -477,6 +477,11 @@ #will be handled faster. This is a performance feature and #should not affect the correctness of postcopy migration. #(since 7.1) +# @vhost-user-fs: If enabled, the migration process will allow migration of +# vhost-user-fs devices, this should be enabled only when +# backend can preserve local FUSE state e.g. for qemu update +# when dst reconects to the same endpoints after migration. +# (since 8.0) # # Features: # @unstable: Members @x-colo and @x-ignore-shared are experimental. @@ -492,7 +497,7 @@ 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, 'validate-uuid', 'background-snapshot', - 'zero-copy-send', 'postcopy-preempt'] } + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } ## # @MigrationCapabilityStatus: -- 2.34.1
Re: [Virtio-fs] [PATCH] vhost-user-fs: fix features handling
On 09/04/2021 18:56, Vivek Goyal wrote: On Thu, Apr 08, 2021 at 10:55:34PM +0300, Anton Kuchin wrote: Make virtio-fs take into account server capabilities. Just returning requested features assumes they all of then are implemented by server and results in setting unsupported configuration if some of them are absent. Signed-off-by: Anton Kuchin [CC stefan and qemu-devel.] Can you give more details of what problem exactly you are facing. Or this fix is about avoiding a future problem where device can refuse to support a feature qemu is requesting for. This fixes existing problem that qemu ignores features (un)supported by backend and this works fine only if backend features match features of qemu. Otherwise qemu incorrectly assumes that backend suports all of them and calls vhost_set_features() not taking into account result of previous vhost_get_features() call. This breaks protocol and can crash server or cause incorrect behavior. This is why I hope it to be accepted in time for 6.0 release. IIUC, this patch is preparing a list of features vhost-user-fs device can support. Then it calls vhost_get_features() which makes sure that all these features are support by real vhost-user device (hdev->features). If not, then corresponding feature is reset and remaining features are returned to caller. When this callback is executed in virtio_bus_device_plugged() list of features that vhost-backend supports has been already obtained earlier by vhost_user_get_features() in vuf_device_realize() and stored in hdev->features. vuf_get_features() should return bitmask of features that are common for vhost backend (hdev->features) and frontend (vdev->host_features). But instead it just returns host features. This feature negotion bit is called in so many places that I am kind of lost that who should be doing what. Will leave it to Stefan who understands it much better. --- hw/virtio/vhost-user-fs.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index ac4fc34b36..6cf983ba0e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,14 @@ #include "monitor/monitor.h" #include "sysemu/sysemu.h" +static const int user_feature_bits[] = { +VIRTIO_F_VERSION_1, +VIRTIO_RING_F_INDIRECT_DESC, +VIRTIO_RING_F_EVENT_IDX, +VIRTIO_F_NOTIFY_ON_EMPTY, +VHOST_INVALID_FEATURE_BIT +}; + static void vuf_get_config(VirtIODevice *vdev, uint8_t *config) { VHostUserFS *fs = VHOST_USER_FS(vdev); @@ -129,11 +137,12 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status) } static uint64_t vuf_get_features(VirtIODevice *vdev, - uint64_t requested_features, - Error **errp) + uint64_t features, Will it make sense to keep the name requested_features. This kind of makes it clear that caller is requesting these features. I feel there should be few lines of comments also to make it clear what this function is actually doing. This fix was inspired by similar functions in hw/scsi/vhost-scsi-common.c, hw/virtio/vhost-user-vsock.c, hw/block/vhost-user-blk.c and hw/net/vhost_net.c and I borrowed naming from them just to be consistent. IMO this looks like a good place for refactoring because we have more and more vhost-user devices that duplicate that code, but it can be done after the bug is fixed. Vivek + Error **errp) { -/* No feature bits used yet */ -return requested_features; +VHostUserFS *fs = VHOST_USER_FS(vdev); + +return vhost_get_features(&fs->vhost_dev, user_feature_bits, features); } static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq) -- 2.25.1 ___ Virtio-fs mailing list virtio...@redhat.com https://listman.redhat.com/mailman/listinfo/virtio-fs
[PATCH] vhost-user-fs: fix features handling
Make virtio-fs take into account server capabilities. Just returning requested features assumes they all of then are implemented by server and results in setting unsupported configuration if some of them are absent. Signed-off-by: Anton Kuchin --- hw/virtio/vhost-user-fs.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index ac4fc34b36..6cf983ba0e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,14 @@ #include "monitor/monitor.h" #include "sysemu/sysemu.h" +static const int user_feature_bits[] = { +VIRTIO_F_VERSION_1, +VIRTIO_RING_F_INDIRECT_DESC, +VIRTIO_RING_F_EVENT_IDX, +VIRTIO_F_NOTIFY_ON_EMPTY, +VHOST_INVALID_FEATURE_BIT +}; + static void vuf_get_config(VirtIODevice *vdev, uint8_t *config) { VHostUserFS *fs = VHOST_USER_FS(vdev); @@ -129,11 +137,12 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t status) } static uint64_t vuf_get_features(VirtIODevice *vdev, - uint64_t requested_features, - Error **errp) + uint64_t features, + Error **errp) { -/* No feature bits used yet */ -return requested_features; +VHostUserFS *fs = VHOST_USER_FS(vdev); + +return vhost_get_features(&fs->vhost_dev, user_feature_bits, features); } static void vuf_handle_output(VirtIODevice *vdev, VirtQueue *vq) -- 2.25.1
vhost-user: questions regarding migration
Hi, I'm implementing migration support in vhost-user backend and have a couple of questions: 1. How master can be sure that logging was started? We expect that right after set_fatures command with VHOST_F_LOG_ALL flag all memory modifications will be tracked in log, but slave can need a little time to process this command so there is a chance that some requests can be untracked. Is there a way to ensure all requests are logged or determine the moment since when tracking starts and master can start migrating memory? 2. Why do we need separate log_addr for vring and how can it be not covered by mem table? As far as I understand slave receives used address in set_vring_addr command and to map it correctly we do need valid entry in memory table. So this field looks redundant to me. Am I missing something? BTW the word "log_guest_addr" is mentioned only once in the document and in "vring address description" payload it is just called "log", shouldn't we should change this names to match? pEpkey.asc Description: application/pgp-keys
[RFC PATCH] virtio: Change order of appling runstate to device and bus
On transition to running first apply state to bus and then to device so device can access bus functions correctly. When going to stopped notify device first and then the bus. Signed-off-by: Anton Kuchin --- hw/virtio/virtio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 04716b5f6c..2ea2edba10 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -3096,7 +3096,7 @@ static void virtio_vmstate_change(void *opaque, int running, RunState state) bool backend_run = running && virtio_device_started(vdev, vdev->status); vdev->vm_running = running; -if (backend_run) { +if (!backend_run) { virtio_set_status(vdev, vdev->status); } @@ -3104,7 +3104,7 @@ static void virtio_vmstate_change(void *opaque, int running, RunState state) k->vmstate_change(qbus->parent, backend_run); } -if (!backend_run) { +if (backend_run) { virtio_set_status(vdev, vdev->status); } } -- 2.20.1
[Qemu-devel] [PATCH] virtio-net: remove redundant qdev from VirtIONet
Signed-off-by: Anton Kuchin --- hw/net/virtio-net.c| 3 +-- include/hw/virtio/virtio-net.h | 1 - 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index b9e1cd71cf..16d2ad5927 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -387,7 +387,7 @@ static void rxfilter_notify(NetClientState *nc) VirtIONet *n = qemu_get_nic_opaque(nc); if (nc->rxfilter_notify_enabled) { -gchar *path = object_get_canonical_path(OBJECT(n->qdev)); +gchar *path = object_get_canonical_path(OBJECT(n)); qapi_event_send_nic_rx_filter_changed(!!n->netclient_name, n->netclient_name, path); g_free(path); @@ -2759,7 +2759,6 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) nc->rxfilter_notify_enabled = 1; QTAILQ_INIT(&n->rsc_chains); -n->qdev = dev; } static void virtio_net_device_unrealize(DeviceState *dev, Error **errp) diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index b96f0c643f..4a1b599d48 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -174,7 +174,6 @@ struct VirtIONet { uint32_t *vlans; virtio_net_conf net_conf; NICConf nic_conf; -DeviceState *qdev; int multiqueue; uint16_t max_queues; uint16_t curr_queues; -- 2.20.1
[Qemu-devel] What events should be handled in iohandler context?
Hi, I'm trying to understand contexts and handlers/notifiers and a bit confused about two contexts living in main loop: qemu_aio_context and iohandler_ctx. It is mentioned in the iohandler_ctx comment that qemu_aio_context can't be reused because "iohandlers mustn't be polled by aio_poll(qemu_aio_context)" but there is no exlanation why. I tried to find examples and failed to understand why virtio-net eventfds are registred to iohandler_ctx with generic virtio callback virtio_device_start_ioeventfd_impl() but TX bottom-half and handlers of back-end TAP use qemu_aio_context. Can you explain a little bit why we need some fds to be polled and some not to be polled? And how can I choose which context is right for me? Thanks in advance for your help! Anton
[Qemu-devel] [PATCH] block: remove bs from lists before closing
Close involves flush that can be performed asynchronously and bs must be protected from being referenced before it is deleted. Signed-off-by: Anton Kuchin --- block.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index 9ae5c0ed2f..b505271a4d 100644 --- a/block.c +++ b/block.c @@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs) assert(bdrv_op_blocker_is_empty(bs)); assert(!bs->refcnt); -bdrv_close(bs); - /* remove from list, if necessary */ if (bs->node_name[0] != '\0') { QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list); } QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list); +bdrv_close(bs); + g_free(bs); } -- 2.19.1
Re: [Qemu-devel] aio context ownership during bdrv_close()
On 29/04/2019 13:38, Kevin Wolf wrote: Am 26.04.2019 um 14:24 hat Anton Kuchin geschrieben: I can't figure out ownership of aio context during bdrv_close(). As far as I understand bdrv_unref() shold be called with acquired aio context to prevent concurrent operations (at least most usages in blockdev.c explicitly acquire and release context, but not all). I think the theory is like this: 1. bdrv_unref() can only be called from the main thread 2. A block node for which bdrv_close() is called has no references. If there are no more parents that keep it in a non-default iothread, they should be in the main AioContext. So no locks need to be taken during bdrv_close(). In practice, 2. is not fully true today, even though block devices do stop their dataplane and move the block nodes back to the main AioContext on shutdown. I am currently working on some fixes related to this, afterwards the situation should be better. But if refcount reaches zero and bs is going to be deleted in bdrv_close() we need to ensure that drain is finished data is flushed and there are no more pending coroutines and bottomhalves, so drain and flush functions can enter coroutine and perform yield in several places. As a result control returns to coroutine caller that will release aio context and when completion bh will continue cleanup process it will be executed without ownership of context. Is this a valid situation? Do you have an example where this happens? Normally, leaving the coroutine means that the AioContext lock is released, but it is later reentered from the same AioContext, so the lock will be taken again. I was wrong. Coroutines do acquire aio context when reentered. Moreover if yield happens bs that is being deleted has zero refcount but is still present in lists graph_bdrv_states and all_bdrv_states and can be accidentally accessed. Shouldn't we remove it from these lists ASAP when deletion process starts as we do from monitor_bdrv_states? Hm, I think it should only disappear when the image file is actually closed. But in practice, it probably makes little difference either way. I'm still worried about a period of time since coroutine yields and until it is reentered, looks like aio context can be grabbed by other code and bs can be treated as valid. I have no example of such situation too but I see there bdrv_aio_flush and bdrv_co_flush_to_disk callbacks which are called during flush and can take unpredicable time to complete (e.g. raw_aio_flush in file-win32 uses thread pool inside to process request and RBD can also be affected but I didn't dig deep enough to be sure). If main loop starts processing next qmp command before completion is called lists will be in inconsistent state and hard to debug use-after-free bugs and crashes can happen. Fix seems trivial and shouldn't break any existing code. --- diff --git a/block.c b/block.c index 16615bc876..25c3b72520 100644 --- a/block.c +++ b/block.c @@ -4083,14 +4083,14 @@ static void bdrv_delete(BlockDriverState *bs) assert(bdrv_op_blocker_is_empty(bs)); assert(!bs->refcnt); - bdrv_close(bs); - /* remove from list, if necessary */ if (bs->node_name[0] != '\0') { QTAILQ_REMOVE(&graph_bdrv_states, bs, node_list); } QTAILQ_REMOVE(&all_bdrv_states, bs, bs_list); + bdrv_close(bs); + g_free(bs); } -- Kevin
[Qemu-devel] aio context ownership during bdrv_close()
I can't figure out ownership of aio context during bdrv_close(). As far as I understand bdrv_unref() shold be called with acquired aio context to prevent concurrent operations (at least most usages in blockdev.c explicitly acquire and release context, but not all). But if refcount reaches zero and bs is going to be deleted in bdrv_close() we need to ensure that drain is finished data is flushed and there are no more pending coroutines and bottomhalves, so drain and flush functions can enter coroutine and perform yield in several places. As a result control returns to coroutine caller that will release aio context and when completion bh will continue cleanup process it will be executed without ownership of context. Is this a valid situation? Moreover if yield happens bs that is being deleted has zero refcount but is still present in lists graph_bdrv_states and all_bdrv_states and can be accidentally accessed. Shouldn't we remove it from these lists ASAP when deletion process starts as we do from monitor_bdrv_states?
[Qemu-devel] Is IOThread for virtio-net a good idea?
As far as I can see currently IOThread offloading is used only for block devices and all others are emulated by main thread. I expect that network devices can also benefit from processing in separate thread but I couldn't find any recent work in this direction. I'm going to implement a PoC but I want to ask if you know any previous attempts and do you know why it can be a total waste of time. Are there fundamental obstacles that prevent network emulation handling in IOThread?
[Qemu-devel] [PATCH] block: split block/qapi.c to avoid linking utilities with qapi
Only part of block/qapi.c is used by qemu-io qemu-nbd and qemu-img and its not realy about QAPI, so move it to separate file to reduce amount of unused code linked to utilities and avoid unnecessary dependencies. Signed-off-by: Anton Kuchin --- block.c | 2 +- block/Makefile.objs | 3 +- block/bdrv_info.c | 582 ++ block/qapi.c | 553 +--- hmp.c | 2 +- include/block/{qapi.h => bdrv_info.h} | 6 +- monitor.c | 2 +- qemu-img.c| 2 +- qemu-io-cmds.c| 2 +- 9 files changed, 593 insertions(+), 561 deletions(-) create mode 100644 block/bdrv_info.c rename include/block/{qapi.h => bdrv_info.h} (97%) diff --git a/block.c b/block.c index 4f5ff2cc12..a14d64c2fd 100644 --- a/block.c +++ b/block.c @@ -43,7 +43,7 @@ #include "qemu/notify.h" #include "qemu/option.h" #include "qemu/coroutine.h" -#include "block/qapi.h" +#include "block/bdrv_info.h" #include "qemu/timer.h" #include "qemu/cutils.h" #include "qemu/id.h" diff --git a/block/Makefile.objs b/block/Makefile.objs index 7a81892a52..aab92d59ec 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -14,7 +14,7 @@ block-obj-y += quorum.o block-obj-y += blkdebug.o blkverify.o blkreplay.o block-obj-$(CONFIG_PARALLELS) += parallels.o block-obj-y += blklogwrites.o -block-obj-y += block-backend.o snapshot.o qapi.o +block-obj-y += block-backend.o snapshot.o bdrv_info.o block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o block-obj-$(CONFIG_POSIX) += file-posix.o block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o @@ -41,6 +41,7 @@ block-obj-y += throttle.o copy-on-read.o block-obj-y += crypto.o common-obj-y += stream.o +common-obj-y += qapi.o nfs.o-libs := $(LIBNFS_LIBS) iscsi.o-cflags := $(LIBISCSI_CFLAGS) diff --git a/block/bdrv_info.c b/block/bdrv_info.c new file mode 100644 index 00..425045421b --- /dev/null +++ b/block/bdrv_info.c @@ -0,0 +1,582 @@ +/* + * Block layer qmp and info dump related functions + * + * Copyright (c) 2003-2008 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "block/bdrv_info.h" +#include "block/block_int.h" +#include "block/write-threshold.h" +#include "qapi/error.h" +#include "qapi/qapi-commands-block-core.h" +#include "qapi/qobject-output-visitor.h" +#include "qapi/qapi-visit-block-core.h" +#include "qapi/qmp/qbool.h" +#include "qapi/qmp/qdict.h" +#include "qapi/qmp/qlist.h" +#include "qapi/qmp/qnum.h" +#include "qapi/qmp/qstring.h" +#include "sysemu/block-backend.h" +#include "qemu/cutils.h" + +BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk, +BlockDriverState *bs, Error **errp) +{ +ImageInfo **p_image_info; +BlockDriverState *bs0; +BlockDeviceInfo *info; + +if (!bs->drv) { +error_setg(errp, "Block device %s is ejected", bs->node_name); +return NULL; +} + +info = g_malloc0(sizeof(*info)); +info->file = g_strdup(bs->filename); +info->ro = bs->read_only; +info->drv= g_strdup(bs->drv->format_name); +info->encrypted = bs->encrypted; +info->encryption_key_missing = false; + +info->cache = g_new(BlockdevCacheInfo, 1); +*info->cache = (BlockdevCacheInfo) { +.writeback = blk ? blk_enable_write_cache(blk) : true, +.direct = !!(bs->open_f
Re: [Qemu-devel] [PATCH] qmp: Deprecate query-nodes option of query-blockstats
On 28/01/2019 18:37, Kevin Wolf wrote: Am 28.01.2019 um 16:15 hat Anton Kuchin geschrieben: This option is broken since a6baa60807 in v2.9 and returns mostly zeroes instead of real stats because actual querring of BlockStats that resides in blk is missing. And it makes no sense because with this option BlockDriverState-s are iterated but BlockAcctStats belong to BlockBackend and not BDS since 7f0e9da6f13 in v2.5 Signed-off-by: Anton Kuchin Isn't query-nodes the only way to get wr_highest_offset for the protocol layer? oVirt depends on this, as far as I know. Isn't it reported without query-nodes parameter as "wr_highest_offset" of "parent" stats entry? What we get with query-nodes is essentially the same output, but without correct data from backend and with one more copy of almost empty data from protocol layer. I'll try to find usages of this option in oVirt code. Kevin
Re: [Qemu-devel] [PATCH 0/2] block: add blk_lookup() for getting device by node_name
On 28/01/2019 17:47, Kevin Wolf wrote: Am 28.01.2019 um 15:27 hat Anton Kuchin geschrieben: Some HMP and QMP commands are targeting BlockBackend but for hotplugged devices name of BB is deprecated, instead name of root BlockDriverState is set. These patches add functions to search BB by attached root BDS name. This approach isn't perfect, but I couldn't invent a better one and I belive it's more convinient than accessing BB by QOM path. There could be more than one BlockBackend attached to a single node, so this approach is simply wrong. I was thinking about this but couldn't imagine configuration when it's having more than one root. Can you give an example, please, so I understand this case better. I think the qdev ID is actually not only a pretty convenient way, but in fact the only logical way to address a guest device (and BlockBackends that can be accessed by the monitor are essentially a part of the guest device). As far as I remember backends currently have emply qdev ID so the only way to address them now is QOM path like ".../my_hotplug_drive/virtio-backend". So I have to remember the name of the root driver it is connected to and also add this "/virtio-backend" suffix. Can you explain a bit what are "monitor referenced" backends and which one can be accessed from monitor and which can not. Does your series allow to perform any operation that isn't possible currently? If so, it's probably this operation that needs to be fixed to either accept node names (if it's a backend operation) or a device ID (if it's a frontend operation). Yes. It fixes latency histogram setting, nbd server binding to remove event and a couple of hmp comands. I also suspect that this can affect migration but I could't figure out what name is used for identifying drives. Some of this code is also linked to utilities (qemu-img, qemu-io-cmds ...) which require linking all QAPI stuff to them, that seems to be an overkill. I'll prepare the patch to eliminate this dependency and allow using QOM here if we decide that this is the best way. Kevin
[Qemu-devel] [PATCH] qmp: Deprecate query-nodes option of query-blockstats
This option is broken since a6baa60807 in v2.9 and returns mostly zeroes instead of real stats because actual querring of BlockStats that resides in blk is missing. And it makes no sense because with this option BlockDriverState-s are iterated but BlockAcctStats belong to BlockBackend and not BDS since 7f0e9da6f13 in v2.5 Signed-off-by: Anton Kuchin --- block/qapi.c | 26 +++--- qapi/block-core.json | 8 +--- qemu-deprecated.texi | 5 + 3 files changed, 13 insertions(+), 26 deletions(-) diff --git a/block/qapi.c b/block/qapi.c index c66f949db8..19d4b4ee42 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -502,8 +502,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) &ds->x_flush_latency_histogram); } -static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs, -bool blk_level) +static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs) { BlockStats *s = NULL; @@ -517,7 +516,7 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs, /* Skip automatically inserted nodes that the user isn't aware of in * a BlockBackend-level command. Stay at the exact node for a node-level * command. */ -while (blk_level && bs->drv && bs->implicit) { +while (bs->drv && bs->implicit) { bs = backing_bs(bs); assert(bs); } @@ -531,12 +530,12 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs, if (bs->file) { s->has_parent = true; -s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level); +s->parent = bdrv_query_bds_stats(bs->file->bs); } -if (blk_level && bs->backing) { +if (bs->backing) { s->has_backing = true; -s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level); +s->backing = bdrv_query_bds_stats(bs->backing->bs); } return s; @@ -577,21 +576,10 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes, { BlockStatsList *head = NULL, **p_next = &head; BlockBackend *blk; -BlockDriverState *bs; /* Just to be safe if query_nodes is not always initialized */ if (has_query_nodes && query_nodes) { -for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) { -BlockStatsList *info = g_malloc0(sizeof(*info)); -AioContext *ctx = bdrv_get_aio_context(bs); - -aio_context_acquire(ctx); -info->value = bdrv_query_bds_stats(bs, false); -aio_context_release(ctx); - -*p_next = info; -p_next = &info->next; -} +error_setg(errp, "Option query_nodes is deprecated"); } else { for (blk = blk_all_next(NULL); blk; blk = blk_all_next(blk)) { BlockStatsList *info; @@ -604,7 +592,7 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes, } aio_context_acquire(ctx); -s = bdrv_query_bds_stats(blk_bs(blk), true); +s = bdrv_query_bds_stats(blk_bs(blk)); s->has_device = true; s->device = g_strdup(blk_name(blk)); diff --git a/qapi/block-core.json b/qapi/block-core.json index 91685be6c2..2dd5f6032c 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -892,13 +892,7 @@ # # Query the @BlockStats for all virtual block devices. # -# @query-nodes: If true, the command will query all the block nodes -# that have a node name, in a list which will include "parent" -# information, but not "backing". -# If false or omitted, the behavior is as before - query all the -# device backends, recursively including their "parent" and -# "backing". Filter nodes that were created implicitly are -# skipped over in this mode. (Since 2.3) +# @query-nodes: deprecated since 3.2 # # Returns: A list of @BlockStats for each virtual block devices. # diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index 219206a836..e1e04ced7d 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -112,6 +112,11 @@ Use ``device_add'' for hotplugging vCPUs instead of ``cpu-add''. See documentation of ``query-hotpluggable-cpus'' for additional details. +@subsection query-blockstats (since 3.2) + +"query-nodes" parameter is not supported anymore because blockstats +are not a prorerty of node. + @section Human Monitor Protocol (HMP) commands @subsection The hub_id parameter of 'hostfwd_add' / 'hostfwd_remove' (since 3.1) -- 2.19.1
[Qemu-devel] [PATCH 1/2] block: add functions to search BlockBackend by root BDS name
BlockBackend name is empty if it is added with '-blockdev' and '-device' options or hotplugged with QMP but callers still expect backend to be accesible by name for operations like commit or statistics access. Intoduce blk_lookup function to search both by name and BDS-root node_name. Signed-off-by: Anton Kuchin --- block/block-backend.c | 29 + include/sysemu/block-backend.h | 7 +++ 2 files changed, 36 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 60d37a0c3d..86a492853c 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -684,6 +684,35 @@ static BlockBackend *bdrv_first_blk(BlockDriverState *bs) return NULL; } +/* + * Return the BlockBackend that has attached BDS-tree root with + * node_name @node_name if it exists, else null. + * @node_name must not be null. + */ +static BlockBackend *blk_by_root_name(const char *node_name) +{ +BlockBackend *blk = NULL; + +assert(node_name); +while ((blk = blk_all_next(blk)) != NULL) { +BlockDriverState *bs = blk_bs(blk); +if (bs && !strcmp(node_name, bs->node_name)) { +return blk; +} +} +return NULL; +} + +BlockBackend *blk_lookup(const char *name) +{ +assert(name); +BlockBackend *blk = blk_by_name(name); +if (!blk) { +blk = blk_by_root_name(name); +} +return blk; +} + /* * Returns true if @bs has an associated BlockBackend. */ diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index c96bcdee14..290b8f8fc9 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -85,6 +85,13 @@ void blk_unref(BlockBackend *blk); void blk_remove_all_bs(void); const char *blk_name(const BlockBackend *blk); BlockBackend *blk_by_name(const char *name); + +/* + * Search BlockBackend by name or root BlockDriverSate node_name. + * Hotplug BlockBackends have no name so need to also check BDS-tree roots + * @name must not be null. + */ +BlockBackend *blk_lookup(const char *name); BlockBackend *blk_next(BlockBackend *blk); BlockBackend *blk_all_next(BlockBackend *blk); bool monitor_add_blk(BlockBackend *blk, const char *name, Error **errp); -- 2.19.1
[Qemu-devel] [PATCH 0/2] block: add blk_lookup() for getting device by node_name
Some HMP and QMP commands are targeting BlockBackend but for hotplugged devices name of BB is deprecated, instead name of root BlockDriverState is set. These patches add functions to search BB by attached root BDS name. This approach isn't perfect, but I couldn't invent a better one and I belive it's more convinient than accessing BB by QOM path. Anton Kuchin (2): block: add functions to search BlockBackend by root BDS name block: migrate callers from blk_by_name to blk_lookup block/block-backend.c | 29 + blockdev-nbd.c | 2 +- blockdev.c | 6 +++--- hmp.c | 2 +- include/sysemu/block-backend.h | 7 +++ 5 files changed, 41 insertions(+), 5 deletions(-) -- 2.19.1
[Qemu-devel] [PATCH 2/2] block: migrate callers from blk_by_name to blk_lookup
Signed-off-by: Anton Kuchin --- blockdev-nbd.c | 2 +- blockdev.c | 6 +++--- hmp.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/blockdev-nbd.c b/blockdev-nbd.c index d73ac1b026..f2ea7318cf 100644 --- a/blockdev-nbd.c +++ b/blockdev-nbd.c @@ -162,7 +162,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, return; } -on_eject_blk = blk_by_name(device); +on_eject_blk = blk_lookup(device); bs = bdrv_lookup_bs(device, device, errp); if (!bs) { diff --git a/blockdev.c b/blockdev.c index a8fa8748a9..bb01c41038 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1082,7 +1082,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict) BlockDriverState *bs; AioContext *aio_context; -blk = blk_by_name(device); +blk = blk_lookup(device); if (!blk) { monitor_printf(mon, "Device '%s' not found\n", device); return; @@ -3066,7 +3066,7 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict) return; } -blk = blk_by_name(id); +blk = blk_lookup(id); if (!blk) { error_report("Device '%s' not found", id); return; @@ -4431,7 +4431,7 @@ void qmp_x_block_latency_histogram_set( bool has_boundaries_flush, uint64List *boundaries_flush, Error **errp) { -BlockBackend *blk = blk_by_name(device); +BlockBackend *blk = blk_lookup(device); BlockAcctStats *stats; int ret; diff --git a/hmp.c b/hmp.c index b2a2b1f84e..7b03d5c1d7 100644 --- a/hmp.c +++ b/hmp.c @@ -2460,7 +2460,7 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict) Error *err = NULL; int ret; -blk = blk_by_name(device); +blk = blk_lookup(device); if (!blk) { BlockDriverState *bs = bdrv_lookup_bs(NULL, device, &err); if (bs) { -- 2.19.1
Re: [Qemu-devel] [RFC] block: Is name of BlockBackend deprecated with -blockdev parameter?
I want to make it consistent but I'm note sure I understand all aspects of current state and legacy clients we need to support. The initial idea was to have single function blk_lookup(char* name) and search in all namespaces internally as they are guaranteed to have no intersection. This will work a little slower but I see no hot paths that will be affected. And this way transition will be transparent to users but I'm not sure this can be done in backward compatible way. Do you have any suggestions how to do it correctly? On 11/12/2018 12:47, Kevin Wolf wrote: [...] Anton Kuchin writes: [...] As far as I understand all named backends are stored in monitor_block_backends list, but I can't get what is the point of having this list, and why parse_drive() function doesn't call monitor_add_blk() like blockdev_init() does with -drive option. Can someone give me a hint on this? And what are monitor_block_backends used for? What does monitor reference mean in this context and how backends in this list differ from all others? I also noticed that some commands fallback to search by qdev_id or BDS->node_name, but at the creation time (both in bdrv_assing_node_name and monitor_add_blk) it is already checked that names are unique across these namespaces so may be it would be useful to introduce generic search function? Yes, BlockBackend names are not supposed to be used in the external interfaces any more. If the QMP command is about the backend, it will take a node name, and if it's about the guest device (which may not even have a node attached with removable media), it will take a qdev ID. The implementation of qmp_x_block_latency_histogram_set() is incorrect, it shouldn't use blk_by_name(). I wonder where it got that pattern from because it was added long after all other QMP commands had switched to qmp_get_blk() or bdrv_lookup_bs(). Do we still need blk_by_name() to be public? May be we can replace it by new function like blk_lookup() and hide blk_by_name() for internal use only or fix its behavior to search by qdev_id first and if it fails fallback to old way of searching by device_id? hmp_commit() should indeed use bdrv_lookup_bs() to also accept node names. But it also checks blk state before commit, so I'm not quite sure it can work correctly without blk. I think we reviewed QMP to make sure we converted everything and forgot about HMP commands that aren't mapped to QMP. Kevin Anton
[Qemu-devel] [RFC] block: Is name of BlockBackend deprecated with -blockdev parameter?
Hello, I'm trying to switch from -drive parameter to -blockdev + -device and having problems. Looks like with this option I have no way to set the name of created BlockBackend, but some QMP and HMP commands are trying to find blk with blk_by_name() and fail to locate my device (e.g. hmp_commit, qmp_x_bloc_latency_histogram_set ...). Was it intentional and BB names are going to be deprecated? This also seems to be a the case for all block devices hotplugged with QMP as they use the same code path. As far as I understand all named backends are stored in monitor_block_backends list, but I can't get what is the point of having this list, and why parse_drive() function doesn't call monitor_add_blk() like blockdev_init() does with -drive option. Can someone give me a hint on this? I also noticed that some commands fallback to search by qdev_id or BDS->node_name, but at the creation time (both in bdrv_assing_node_name and monitor_add_blk) it is already checked that names are unique across these namespaces so may be it would be useful to introduce generic search function? Thanks, Anton