Re: [PATCH v3] virtio: force spec specified alignment on types
On 2020/4/21 上午4:46, Michael S. Tsirkin wrote: The ring element addresses are passed between components with different alignments assumptions. Thus, if guest/userspace selects a pointer and host then gets and dereferences it, we might need to decrease the compiler-selected alignment to prevent compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration, but it seems safer to handle this generally. Note that userspace that allocates the memory is actually OK and does not need to be fixed, but userspace that gets it from guest or another process does need to be fixed. The later doesn't generally talk to the kernel so while it might be buggy it's not talking to the kernel in the buggy way - it's just using the header in the buggy way - so fixing header and asking userspace to recompile is the best we can do. I verified that the produced kernel binary on x86 is exactly identical before and after the change. Signed-off-by: Michael S. Tsirkin --- changes from v2: add vring_used_elem_t to ensure alignment for substructures changes from v1: swicth all __user to the new typedefs drivers/vhost/vhost.c| 8 +++--- drivers/vhost/vhost.h| 6 ++--- drivers/vhost/vringh.c | 6 ++--- include/linux/vringh.h | 6 ++--- include/uapi/linux/virtio_ring.h | 43 5 files changed, 45 insertions(+), 24 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d450e16c5c25..bc77b0f465fd 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1244,9 +1244,9 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access) } static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used) +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used) { return access_ok(desc, vhost_get_desc_size(vq, num)) && @@ -2301,7 +2301,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, unsigned count) { - struct vring_used_elem __user *used; + vring_used_elem_t __user *used; u16 old, new; int start; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index f8403bd46b85..60cab4c78229 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -67,9 +67,9 @@ struct vhost_virtqueue { /* The actual ring of buffers. */ struct mutex mutex; unsigned int num; - struct vring_desc __user *desc; - struct vring_avail __user *avail; - struct vring_used __user *used; + vring_desc_t __user *desc; + vring_avail_t __user *avail; + vring_used_t __user *used; const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS]; struct file *kick; struct eventfd_ctx *call_ctx; diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index ba8e0d6cfd97..e059a9a47cdf 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -620,9 +620,9 @@ static inline int xfer_to_user(const struct vringh *vrh, */ int vringh_init_user(struct vringh *vrh, u64 features, unsigned int num, bool weak_barriers, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used) +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used) { /* Sane power of 2 please! */ if (!num || num > 0x || (num & (num - 1))) { diff --git a/include/linux/vringh.h b/include/linux/vringh.h index 9e2763d7c159..59bd50f99291 100644 --- a/include/linux/vringh.h +++ b/include/linux/vringh.h @@ -105,9 +105,9 @@ struct vringh_kiov { /* Helpers for userspace vrings. */ int vringh_init_user(struct vringh *vrh, u64 features, unsigned int num, bool weak_barriers, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used); +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used); static inline void vringh_iov_init(struct vringh_iov *iov, struct iovec *iovec, unsigned num) diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h index 9223c3a5c46a..b2c20f794472 100644 --- a/include/uapi/linux/virtio_ring.h +++ b/include/uapi/linux/virtio_ring.h @@ -86,6 +86,13 @@
Re: [PATCH v4] vhost: disable for OABI
On 2020/4/20 下午10:34, Michael S. Tsirkin wrote: vhost is currently broken on the some ARM configs. The reason is that the ring element addresses are passed between components with different alignments assumptions. Thus, if guest selects a pointer and host then gets and dereferences it, then alignment assumed by the host's compiler might be greater than the actual alignment of the pointer. compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration. With this OABI, compiler assumes that all structures are 4 byte aligned - which is stronger than virtio guarantees for available and used rings, which are merely 2 bytes. Thus a guest without -mabi=apcs-gnu running on top of host with -mabi=apcs-gnu will be broken. The correct fix is to force alignment of structures - however that is an intrusive fix that's best deferred until the next release. We didn't previously support such ancient systems at all - this surfaced after vdpa support prompted removing dependency of vhost on VIRTULIZATION. So for now, let's just add something along the lines of depends on !ARM || AEABI to the virtio Kconfig declaration, and add a comment that it has to do with struct member alignment. Note: we can't make VHOST and VHOST_RING themselves have a dependency since these are selected. Add a new symbol for that. We should be able to drop this dependency down the road. Fixes: 20c384f1ea1a0bc7 ("vhost: refine vhost and vringh kconfig") Suggested-by: Ard Biesheuvel Suggested-by: Richard Earnshaw Signed-off-by: Michael S. Tsirkin --- changes from v3: update commit log clarifying the motivation and that it's a temporary fix. suggested by Christoph Hellwig drivers/misc/mic/Kconfig | 2 +- drivers/net/caif/Kconfig | 2 +- drivers/vdpa/Kconfig | 2 +- drivers/vhost/Kconfig| 17 + 4 files changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig index 8f201d019f5a..3bfe72c59864 100644 --- a/drivers/misc/mic/Kconfig +++ b/drivers/misc/mic/Kconfig @@ -116,7 +116,7 @@ config MIC_COSM config VOP tristate "VOP Driver" - depends on VOP_BUS + depends on VOP_BUS && VHOST_DPN select VHOST_RING select VIRTIO help diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig index 9db0570c5beb..661c25eb1c46 100644 --- a/drivers/net/caif/Kconfig +++ b/drivers/net/caif/Kconfig @@ -50,7 +50,7 @@ config CAIF_HSI config CAIF_VIRTIO tristate "CAIF virtio transport driver" - depends on CAIF && HAS_DMA + depends on CAIF && HAS_DMA && VHOST_DPN select VHOST_RING select VIRTIO select GENERIC_ALLOCATOR diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index 3e1ceb8e9f2b..e8140065c8a5 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -10,7 +10,7 @@ if VDPA config VDPA_SIM tristate "vDPA device simulator" - depends on RUNTIME_TESTING_MENU && HAS_DMA + depends on RUNTIME_TESTING_MENU && HAS_DMA && VHOST_DPN select VHOST_RING default n help diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 2c75d164b827..c4f273793595 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -13,6 +13,15 @@ config VHOST_RING This option is selected by any driver which needs to access the host side of a virtio ring. +config VHOST_DPN + bool + depends on !ARM || AEABI + default y + help + Anything selecting VHOST or VHOST_RING must depend on VHOST_DPN. + This excludes the deprecated ARM ABI since that forces a 4 byte + alignment on all structs - incompatible with virtio spec requirements. + config VHOST tristate select VHOST_IOTLB @@ -28,7 +37,7 @@ if VHOST_MENU config VHOST_NET tristate "Host kernel accelerator for virtio net" - depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) + depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) && VHOST_DPN select VHOST ---help--- This kernel module can be loaded in host kernel to accelerate @@ -40,7 +49,7 @@ config VHOST_NET config VHOST_SCSI tristate "VHOST_SCSI TCM fabric driver" - depends on TARGET_CORE && EVENTFD + depends on TARGET_CORE && EVENTFD && VHOST_DPN select VHOST default n ---help--- @@ -49,7 +58,7 @@ config VHOST_SCSI config VHOST_VSOCK tristate "vhost virtio-vsock driver" - depends on VSOCKETS && EVENTFD + depends on VSOCKETS && EVENTFD && VHOST_DPN select VHOST select VIRTIO_VSOCKETS_COMMON default n @@ -63,7 +72,7 @@ config VHOST_VSOCK config VHOST_VDPA tristate "Vhost driver for vDPA-based backend" - depends on EVENTFD + depends on EVENTFD && VHOS
[PATCH v3] virtio: force spec specified alignment on types
The ring element addresses are passed between components with different alignments assumptions. Thus, if guest/userspace selects a pointer and host then gets and dereferences it, we might need to decrease the compiler-selected alignment to prevent compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration, but it seems safer to handle this generally. Note that userspace that allocates the memory is actually OK and does not need to be fixed, but userspace that gets it from guest or another process does need to be fixed. The later doesn't generally talk to the kernel so while it might be buggy it's not talking to the kernel in the buggy way - it's just using the header in the buggy way - so fixing header and asking userspace to recompile is the best we can do. I verified that the produced kernel binary on x86 is exactly identical before and after the change. Signed-off-by: Michael S. Tsirkin --- changes from v2: add vring_used_elem_t to ensure alignment for substructures changes from v1: swicth all __user to the new typedefs drivers/vhost/vhost.c| 8 +++--- drivers/vhost/vhost.h| 6 ++--- drivers/vhost/vringh.c | 6 ++--- include/linux/vringh.h | 6 ++--- include/uapi/linux/virtio_ring.h | 43 5 files changed, 45 insertions(+), 24 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d450e16c5c25..bc77b0f465fd 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1244,9 +1244,9 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access) } static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used) +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used) { return access_ok(desc, vhost_get_desc_size(vq, num)) && @@ -2301,7 +2301,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, unsigned count) { - struct vring_used_elem __user *used; + vring_used_elem_t __user *used; u16 old, new; int start; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index f8403bd46b85..60cab4c78229 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -67,9 +67,9 @@ struct vhost_virtqueue { /* The actual ring of buffers. */ struct mutex mutex; unsigned int num; - struct vring_desc __user *desc; - struct vring_avail __user *avail; - struct vring_used __user *used; + vring_desc_t __user *desc; + vring_avail_t __user *avail; + vring_used_t __user *used; const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS]; struct file *kick; struct eventfd_ctx *call_ctx; diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index ba8e0d6cfd97..e059a9a47cdf 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -620,9 +620,9 @@ static inline int xfer_to_user(const struct vringh *vrh, */ int vringh_init_user(struct vringh *vrh, u64 features, unsigned int num, bool weak_barriers, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used) +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used) { /* Sane power of 2 please! */ if (!num || num > 0x || (num & (num - 1))) { diff --git a/include/linux/vringh.h b/include/linux/vringh.h index 9e2763d7c159..59bd50f99291 100644 --- a/include/linux/vringh.h +++ b/include/linux/vringh.h @@ -105,9 +105,9 @@ struct vringh_kiov { /* Helpers for userspace vrings. */ int vringh_init_user(struct vringh *vrh, u64 features, unsigned int num, bool weak_barriers, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used); +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used); static inline void vringh_iov_init(struct vringh_iov *iov, struct iovec *iovec, unsigned num) diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h index 9223c3a5c46a..b2c20f794472 100644 --- a/include/uapi/linux/virtio_ring.h +++ b/include/uapi/linux/virtio_ring.h @@ -86,6 +86,13 @@ * at the end of the used ring. Guest should ignore the used->flags fiel
[PATCH v2] virtio: force spec specified alignment on types
The ring element addresses are passed between components with different alignments assumptions. Thus, if guest/userspace selects a pointer and host then gets and dereferences it, we might need to decrease the compiler-selected alignment to prevent compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration, but it seems safer to handle this generally. Note that userspace that allocates the memory is actually OK and does not need to be fixed, but userspace that gets it from guest or another process does need to be fixed. The later doesn't generally talk to the kernel so while it might be buggy it's not talking to the kernel in the buggy way - it's just using the header in the buggy way - so fixing header and asking userspace to recompile is the best we can do. I verified that the produced kernel binary on x86 is exactly identical before and after the change. Signed-off-by: Michael S. Tsirkin --- Changes from v1: update vhost, vringh pointers to use the new typedefs drivers/vhost/vhost.c| 8 +++ drivers/vhost/vhost.h| 6 ++--- drivers/vhost/vringh.c | 6 ++--- include/linux/vringh.h | 6 ++--- include/uapi/linux/virtio_ring.h | 38 +++- 5 files changed, 41 insertions(+), 23 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d450e16c5c25..21706759377e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1244,9 +1244,9 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access) } static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used) +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used) { return access_ok(desc, vhost_get_desc_size(vq, num)) && @@ -2301,7 +2301,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, unsigned count) { - struct vring_used_elem __user *used; + vring_used_t __user *used; u16 old, new; int start; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index f8403bd46b85..60cab4c78229 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -67,9 +67,9 @@ struct vhost_virtqueue { /* The actual ring of buffers. */ struct mutex mutex; unsigned int num; - struct vring_desc __user *desc; - struct vring_avail __user *avail; - struct vring_used __user *used; + vring_desc_t __user *desc; + vring_avail_t __user *avail; + vring_used_t __user *used; const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS]; struct file *kick; struct eventfd_ctx *call_ctx; diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index ba8e0d6cfd97..e059a9a47cdf 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -620,9 +620,9 @@ static inline int xfer_to_user(const struct vringh *vrh, */ int vringh_init_user(struct vringh *vrh, u64 features, unsigned int num, bool weak_barriers, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used) +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used) { /* Sane power of 2 please! */ if (!num || num > 0x || (num & (num - 1))) { diff --git a/include/linux/vringh.h b/include/linux/vringh.h index 9e2763d7c159..59bd50f99291 100644 --- a/include/linux/vringh.h +++ b/include/linux/vringh.h @@ -105,9 +105,9 @@ struct vringh_kiov { /* Helpers for userspace vrings. */ int vringh_init_user(struct vringh *vrh, u64 features, unsigned int num, bool weak_barriers, -struct vring_desc __user *desc, -struct vring_avail __user *avail, -struct vring_used __user *used); +vring_desc_t __user *desc, +vring_avail_t __user *avail, +vring_used_t __user *used); static inline void vringh_iov_init(struct vringh_iov *iov, struct iovec *iovec, unsigned num) diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h index 9223c3a5c46a..177227f0d9cd 100644 --- a/include/uapi/linux/virtio_ring.h +++ b/include/uapi/linux/virtio_ring.h @@ -118,16 +118,6 @@ struct vring_used { struct vring_used_elem ring[]; }; -struct vring { - unsigned int num; - - struct vring_desc *desc; -
[GIT PULL v2] vhost: cleanups and fixes
The following changes since commit 8f3d9f354286745c751374f5f1fcafee6b3f3136: Linux 5.7-rc1 (2020-04-12 12:35:55 -0700) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to d085eb8ce727e581abf8145244eaa3339021be2f: vhost: disable for OABI (2020-04-20 10:19:22 -0400) Changes from v1: Dropped a bunch of cleanups which turned out to be controversial This has been in next for a while, though I tweaked some commit logs so the hashes differ. virtio: fixes, cleanups Some bug fixes. Cleanup a couple of issues that surfaced meanwhile. Disable vhost on ARM with OABI for now - to be fixed fully later in the cycle or in the next release. Signed-off-by: Michael S. Tsirkin Alexander Duyck (1): virtio-balloon: Avoid using the word 'report' when referring to free page hinting Eugenio Pérez (1): vhost: Create accessors for virtqueues private_data Gustavo A. R. Silva (1): vhost: vdpa: remove unnecessary null check Jason Wang (1): vdpa: fix comment of vdpa_register_device() Jason Yan (2): vhost: remove set but not used variable 'status' virtio-balloon: make virtballoon_free_page_report() static Michael S. Tsirkin (15): vdpa-sim: depend on HAS_DMA virtio/test: fix up after IOTLB changes tools/virtio: define aligned attribute tools/virtio: make asm/barrier.h self contained virtgpu: pull in uaccess.h virtio-rng: pull in slab.h remoteproc: pull in slab.h virtio_input: pull in slab.h rpmsg: pull in slab.h remoteproc: pull in slab.h vdpa: allow a 32 bit vq alignment vdpa: make vhost, virtio depend on menu virtio_blk: add a missing include virtio: drop vringh.h dependency vhost: disable for OABI Stephen Rothwell (1): drm/virtio: fix up for include file changes YueHaibing (2): vdpa: remove unused variables 'ifcvf' and 'ifcvf_lm' vdpasim: Return status in vdpasim_get_status drivers/block/virtio_blk.c | 1 + drivers/char/hw_random/virtio-rng.c| 1 + drivers/gpu/drm/virtio/virtgpu_ioctl.c | 1 + drivers/gpu/drm/virtio/virtgpu_kms.c | 1 + drivers/misc/mic/Kconfig | 2 +- drivers/net/caif/Kconfig | 2 +- drivers/remoteproc/remoteproc_sysfs.c | 1 + drivers/remoteproc/stm32_rproc.c | 1 + drivers/rpmsg/mtk_rpmsg.c | 1 + drivers/vdpa/Kconfig | 18 ++ drivers/vdpa/ifcvf/ifcvf_base.c| 2 -- drivers/vdpa/ifcvf/ifcvf_main.c| 4 +--- drivers/vdpa/vdpa.c| 2 +- drivers/vdpa/vdpa_sim/vdpa_sim.c | 4 ++-- drivers/vhost/Kconfig | 21 - drivers/vhost/net.c| 28 +++- drivers/vhost/scsi.c | 14 +++--- drivers/vhost/test.c | 14 +++--- drivers/vhost/vdpa.c | 5 - drivers/vhost/vhost.h | 27 +++ drivers/vhost/vringh.c | 5 + drivers/vhost/vsock.c | 14 +++--- drivers/virtio/Kconfig | 2 +- drivers/virtio/virtio_balloon.c| 4 ++-- drivers/virtio/virtio_input.c | 1 + include/linux/vdpa.h | 2 +- include/linux/virtio.h | 1 - include/linux/vringh.h | 6 ++ include/uapi/linux/virtio_balloon.h| 11 +-- tools/virtio/Makefile | 5 +++-- tools/virtio/asm/barrier.h | 1 + tools/virtio/generated/autoconf.h | 0 tools/virtio/linux/compiler.h | 1 + 33 files changed, 128 insertions(+), 75 deletions(-) create mode 100644 tools/virtio/generated/autoconf.h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
On Mon, Apr 20, 2020 at 6:28 AM David Hildenbrand wrote: > > >>> Now we are talking about what's safe to do with the page. > >>> > >>> If POISON flag is set by hypervisor but clear by guest, > >>> or poison_val is 0, then it's clear it's safe to blow > >>> away the page if we can figure out it's unused. > >>> > >>> Otherwise, it's much less clear :) > >> > >> Hah! Agreed :D > > > > That isn't quite true. The problem is in the case of hinting it isn't > > setting the page to 0. It is simply not migrating it. So if there is > > data from an earlier pass it is stuck at that value. So the balloon > > will see the poison/init on some pages cleared, however I suppose the > > balloon doesn't care about the contents of the page. For the pages > > that are leaked back out via the shrinker they will be dirtied so they > > will end up being migrated with the correct value eventually. > > Right, I think current Linux guests are fine. The critical piece we are > talking about is > > 1) Guest balloon allocates and hints a page > 2) Hypervisor does not process hinting request > 3) Guest frees the page and reuses it (especially, writes it). > 4) Hypervisor processes the hinting request. > > AFAIU, as soon as the guest writes the page (e.g., zeroes it/poisons it > in the buddy, or somebody who allocated it), the page *will* get > migrated, even if 4) happens after 3). That's guaranteed by the 2-bitmap > magic. Yes. Basically the page will be flagged as dirty as soon as it is written to, and that will be dealt with after the current batch of hints are processed so we should be able to guarantee that the page will have a coherent value stored in it. > Now, assume the following happens (in some future Linux version) (due to > your "simply not migrating it" comment): > > 1) Guest balloon allocates and hints a page. Assume the page is zero due > to want_init_on_free(). > 2) Hypervisor processes the hinting request. > 3) Guest frees the page. Assume we are implementing some magic to "skip" > zeroing, as we assume it is still zero. > > Due to 2), the page won't get migrated. In 3) we expect the page to be > 0. QEMU would have to make sure that we always get either the original, > or a zero page on the destination. Otherwise, this smells like data > corruption. I agree and that is my concern. From the time the page is hinted until it is written to again it is in an indeterminate state. However with the current Linux guest implementation it is being written to so technically there isn't an issue. We would just need to make sure it stays that way. In addition it is already an exposed interface and this is the way it works. I think if anything we are likely forced to document it as-is and guarantee that the behavior is not changed. > > > >>> I'll have to come back and re-read the rest next week, this > >>> is complex stuff and I'm too rushed with other things today. > >> > >> Yeah, I'm also loaded with other stuff. Maybe Alex has time to > >> understand the details as well. > > > > So after looking it over again it makes a bit more sense why this > > hasn't caused any issues so far, and I can see why the poison enabled > > setup and hinting can work. The problem is I am left wondering what > > assumptions we are allowed to leave in place. > > > > 1. Can we assume that we don't care about the contents in the pages in > > the balloon changing? > > I think, we should define valid ways for the hypervisor to change it. > > "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by > a zero page. However, as soon as the page is written by the guest (even > before the hinting request was processed by the host), the modified page > will stay - whereby the unwritten parts might either be from the old, or > from the zero page." Actually I don't think the zero page is accurate. A page that is hinted basically implies we don't care about the content. I would think that we could treat a hinted page as uninitialized memory. > I think the debatable part is "whereby the unwritten parts might either > be from the old, or from the zero page". AFAIU, you think it could > happen in QEMU, that we have neither the old, nor the zero page, but > instead some previous content. The question is if that's valid, or if > that's a BUG in QEMU. If it's valid, we can do no optimizations in the > guest (e.g., skip zeroing in the buddy). And I agree that this smells > like "data corruption" as Michael said. So if any part of the page is written to after it is hinted you will be getting the modified page since it goes back to being "dirty". All the hinting is doing is skipping the migration of dirty pages that are "hinted" as long as they are not written to again. So the pages are valid up until we migrate to the new system. At that point all of the pages that are in the page hinting balloon will be stale data as we skipped migrating them. Assuming something like page poisoning or init_on_free are enabled they will be poisoned again when they are
[PATCH v4] vhost: disable for OABI
vhost is currently broken on the some ARM configs. The reason is that the ring element addresses are passed between components with different alignments assumptions. Thus, if guest selects a pointer and host then gets and dereferences it, then alignment assumed by the host's compiler might be greater than the actual alignment of the pointer. compiler on the host from assuming pointer is aligned. This actually triggers on ARM with -mabi=apcs-gnu - which is a deprecated configuration. With this OABI, compiler assumes that all structures are 4 byte aligned - which is stronger than virtio guarantees for available and used rings, which are merely 2 bytes. Thus a guest without -mabi=apcs-gnu running on top of host with -mabi=apcs-gnu will be broken. The correct fix is to force alignment of structures - however that is an intrusive fix that's best deferred until the next release. We didn't previously support such ancient systems at all - this surfaced after vdpa support prompted removing dependency of vhost on VIRTULIZATION. So for now, let's just add something along the lines of depends on !ARM || AEABI to the virtio Kconfig declaration, and add a comment that it has to do with struct member alignment. Note: we can't make VHOST and VHOST_RING themselves have a dependency since these are selected. Add a new symbol for that. We should be able to drop this dependency down the road. Fixes: 20c384f1ea1a0bc7 ("vhost: refine vhost and vringh kconfig") Suggested-by: Ard Biesheuvel Suggested-by: Richard Earnshaw Signed-off-by: Michael S. Tsirkin --- changes from v3: update commit log clarifying the motivation and that it's a temporary fix. suggested by Christoph Hellwig drivers/misc/mic/Kconfig | 2 +- drivers/net/caif/Kconfig | 2 +- drivers/vdpa/Kconfig | 2 +- drivers/vhost/Kconfig| 17 + 4 files changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig index 8f201d019f5a..3bfe72c59864 100644 --- a/drivers/misc/mic/Kconfig +++ b/drivers/misc/mic/Kconfig @@ -116,7 +116,7 @@ config MIC_COSM config VOP tristate "VOP Driver" - depends on VOP_BUS + depends on VOP_BUS && VHOST_DPN select VHOST_RING select VIRTIO help diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig index 9db0570c5beb..661c25eb1c46 100644 --- a/drivers/net/caif/Kconfig +++ b/drivers/net/caif/Kconfig @@ -50,7 +50,7 @@ config CAIF_HSI config CAIF_VIRTIO tristate "CAIF virtio transport driver" - depends on CAIF && HAS_DMA + depends on CAIF && HAS_DMA && VHOST_DPN select VHOST_RING select VIRTIO select GENERIC_ALLOCATOR diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index 3e1ceb8e9f2b..e8140065c8a5 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -10,7 +10,7 @@ if VDPA config VDPA_SIM tristate "vDPA device simulator" - depends on RUNTIME_TESTING_MENU && HAS_DMA + depends on RUNTIME_TESTING_MENU && HAS_DMA && VHOST_DPN select VHOST_RING default n help diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 2c75d164b827..c4f273793595 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -13,6 +13,15 @@ config VHOST_RING This option is selected by any driver which needs to access the host side of a virtio ring. +config VHOST_DPN + bool + depends on !ARM || AEABI + default y + help + Anything selecting VHOST or VHOST_RING must depend on VHOST_DPN. + This excludes the deprecated ARM ABI since that forces a 4 byte + alignment on all structs - incompatible with virtio spec requirements. + config VHOST tristate select VHOST_IOTLB @@ -28,7 +37,7 @@ if VHOST_MENU config VHOST_NET tristate "Host kernel accelerator for virtio net" - depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) + depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) && VHOST_DPN select VHOST ---help--- This kernel module can be loaded in host kernel to accelerate @@ -40,7 +49,7 @@ config VHOST_NET config VHOST_SCSI tristate "VHOST_SCSI TCM fabric driver" - depends on TARGET_CORE && EVENTFD + depends on TARGET_CORE && EVENTFD && VHOST_DPN select VHOST default n ---help--- @@ -49,7 +58,7 @@ config VHOST_SCSI config VHOST_VSOCK tristate "vhost virtio-vsock driver" - depends on VSOCKETS && EVENTFD + depends on VSOCKETS && EVENTFD && VHOST_DPN select VHOST select VIRTIO_VSOCKETS_COMMON default n @@ -63,7 +72,7 @@ config VHOST_VSOCK config VHOST_VDPA tristate "Vhost driver for vDPA-based backend" - depends on EVENTFD + depends on EVENTFD && VHOST_DPN select VHOST depends on VDPA help -- MST ___
Re: [PATCH v3] vhost: disable for OABI
On Mon, Apr 20, 2020 at 01:29:09AM -0700, Christoph Hellwig wrote: > On Thu, Apr 16, 2020 at 06:20:20PM -0400, Michael S. Tsirkin wrote: > > vhost is currently broken on the some ARM configs. > > > > The reason is that that uses apcs-gnu which is the ancient OABI that is been > > deprecated for a long time. > > > > Given that virtio support on such ancient systems is not needed in the > > first place, let's just add something along the lines of > > > > depends on !ARM || AEABI > > > > to the virtio Kconfig declaration, and add a comment that it has to do > > with struct member alignment. > > > > Note: we can't make VHOST and VHOST_RING themselves have > > a dependency since these are selected. Add a new symbol for that. > > This description is horrible. The only interesting thing for ARM OABI > is that it has some strange padding rules, but that isn't something > that can't be handled. Please spend some time looking into the issue > and add te proper __padded annotations, we've done that elsewhere in > the kernel and it isn't too bad - in fact it helps understanding issues > with implicit alignment. Yes I have a patch queued to fix it. I wanted a minimal patch for this release though. > And even if you have a good reason not to fix vhost (which I think you > don't have) this changelog is just utter crap, as it fails to mention > what the problem with ARM OABI even is. I'll tweak that, thanks! -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled
>>> Now we are talking about what's safe to do with the page. >>> >>> If POISON flag is set by hypervisor but clear by guest, >>> or poison_val is 0, then it's clear it's safe to blow >>> away the page if we can figure out it's unused. >>> >>> Otherwise, it's much less clear :) >> >> Hah! Agreed :D > > That isn't quite true. The problem is in the case of hinting it isn't > setting the page to 0. It is simply not migrating it. So if there is > data from an earlier pass it is stuck at that value. So the balloon > will see the poison/init on some pages cleared, however I suppose the > balloon doesn't care about the contents of the page. For the pages > that are leaked back out via the shrinker they will be dirtied so they > will end up being migrated with the correct value eventually. Right, I think current Linux guests are fine. The critical piece we are talking about is 1) Guest balloon allocates and hints a page 2) Hypervisor does not process hinting request 3) Guest frees the page and reuses it (especially, writes it). 4) Hypervisor processes the hinting request. AFAIU, as soon as the guest writes the page (e.g., zeroes it/poisons it in the buddy, or somebody who allocated it), the page *will* get migrated, even if 4) happens after 3). That's guaranteed by the 2-bitmap magic. Now, assume the following happens (in some future Linux version) (due to your "simply not migrating it" comment): 1) Guest balloon allocates and hints a page. Assume the page is zero due to want_init_on_free(). 2) Hypervisor processes the hinting request. 3) Guest frees the page. Assume we are implementing some magic to "skip" zeroing, as we assume it is still zero. Due to 2), the page won't get migrated. In 3) we expect the page to be 0. QEMU would have to make sure that we always get either the original, or a zero page on the destination. Otherwise, this smells like data corruption. > >>> I'll have to come back and re-read the rest next week, this >>> is complex stuff and I'm too rushed with other things today. >> >> Yeah, I'm also loaded with other stuff. Maybe Alex has time to >> understand the details as well. > > So after looking it over again it makes a bit more sense why this > hasn't caused any issues so far, and I can see why the poison enabled > setup and hinting can work. The problem is I am left wondering what > assumptions we are allowed to leave in place. > > 1. Can we assume that we don't care about the contents in the pages in > the balloon changing? I think, we should define valid ways for the hypervisor to change it. "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by a zero page. However, as soon as the page is written by the guest (even before the hinting request was processed by the host), the modified page will stay - whereby the unwritten parts might either be from the old, or from the zero page." I think the debatable part is "whereby the unwritten parts might either be from the old, or from the zero page". AFAIU, you think it could happen in QEMU, that we have neither the old, nor the zero page, but instead some previous content. The question is if that's valid, or if that's a BUG in QEMU. If it's valid, we can do no optimizations in the guest (e.g., skip zeroing in the buddy). And I agree that this smells like "data corruption" as Michael said. > 2. Can we assume that the guest will always rewrite the page after the > deflate in the case of init_on_free or poison? Depends on what we think is the right way to do - IOW if we think "some other content" as mentioned above is a BUG or not. > 3. Can we assume that free page hinting will always function as a > balloon setup, so no moving it over to a page reporting type setup? I think we have to define the valid semantics. That implies what would be valid to do with it. Unfortunately, we have to reverse-engineer here. > > If we assume the above 3 items then there isn't any point in worrying > about poison when it comes to free page hinting. It doesn't make sense > to since they essentially negate it. As such I could go through this > patch and the QEMU patches and clean up any associations since the to > are not really tied together in any way. The big question is, if we want to support VIRTIO_BALLOON_F_PAGE_POISON with free page hinting. e.g.,: "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by a page full of X. However, as soon as the page is written by the guest (even before the hinting request was processed by the host), the modified page will stay - whereby the unwritten parts might either be from the old, or from a page filled with X. Without VIRTIO_BALLOON_F_PAGE_POISON, X is zero, otherwise it is poison_val." The current QEMU implementation would be to simply migrate all hinted pages. In the future we could optimize. Not sure if it's worth the trouble. -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foun
Re: [PATCH v3] vhost: disable for OABI
On Thu, Apr 16, 2020 at 06:20:20PM -0400, Michael S. Tsirkin wrote: > vhost is currently broken on the some ARM configs. > > The reason is that that uses apcs-gnu which is the ancient OABI that is been > deprecated for a long time. > > Given that virtio support on such ancient systems is not needed in the > first place, let's just add something along the lines of > > depends on !ARM || AEABI > > to the virtio Kconfig declaration, and add a comment that it has to do > with struct member alignment. > > Note: we can't make VHOST and VHOST_RING themselves have > a dependency since these are selected. Add a new symbol for that. This description is horrible. The only interesting thing for ARM OABI is that it has some strange padding rules, but that isn't something that can't be handled. Please spend some time looking into the issue and add te proper __padded annotations, we've done that elsewhere in the kernel and it isn't too bad - in fact it helps understanding issues with implicit alignment. And even if you have a good reason not to fix vhost (which I think you don't have) this changelog is just utter crap, as it fails to mention what the problem with ARM OABI even is. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization