Re: [PATCH v3] virtio: force spec specified alignment on types

2020-04-20 Thread Jason Wang


On 2020/4/21 上午4:46, Michael S. Tsirkin wrote:

The ring element addresses are passed between components with different
alignments assumptions. Thus, if guest/userspace selects a pointer and
host then gets and dereferences it, we might need to decrease the
compiler-selected alignment to prevent compiler on the host from
assuming pointer is aligned.

This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration, but it seems safer to handle this
generally.

Note that userspace that allocates the memory is actually OK and does
not need to be fixed, but userspace that gets it from guest or another
process does need to be fixed. The later doesn't generally talk to the
kernel so while it might be buggy it's not talking to the kernel in the
buggy way - it's just using the header in the buggy way - so fixing
header and asking userspace to recompile is the best we can do.

I verified that the produced kernel binary on x86 is exactly identical
before and after the change.

Signed-off-by: Michael S. Tsirkin 
---

changes from v2:
add vring_used_elem_t to ensure alignment for substructures
changes from v1:
swicth all __user to the new typedefs

  drivers/vhost/vhost.c|  8 +++---
  drivers/vhost/vhost.h|  6 ++---
  drivers/vhost/vringh.c   |  6 ++---
  include/linux/vringh.h   |  6 ++---
  include/uapi/linux/virtio_ring.h | 43 
  5 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d450e16c5c25..bc77b0f465fd 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1244,9 +1244,9 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, 
u64 iova, int access)
  }
  
  static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,

-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used)
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used)
  
  {

return access_ok(desc, vhost_get_desc_size(vq, num)) &&
@@ -2301,7 +2301,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
struct vring_used_elem *heads,
unsigned count)
  {
-   struct vring_used_elem __user *used;
+   vring_used_elem_t __user *used;
u16 old, new;
int start;
  
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h

index f8403bd46b85..60cab4c78229 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -67,9 +67,9 @@ struct vhost_virtqueue {
/* The actual ring of buffers. */
struct mutex mutex;
unsigned int num;
-   struct vring_desc __user *desc;
-   struct vring_avail __user *avail;
-   struct vring_used __user *used;
+   vring_desc_t __user *desc;
+   vring_avail_t __user *avail;
+   vring_used_t __user *used;
const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS];
struct file *kick;
struct eventfd_ctx *call_ctx;
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index ba8e0d6cfd97..e059a9a47cdf 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -620,9 +620,9 @@ static inline int xfer_to_user(const struct vringh *vrh,
   */
  int vringh_init_user(struct vringh *vrh, u64 features,
 unsigned int num, bool weak_barriers,
-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used)
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used)
  {
/* Sane power of 2 please! */
if (!num || num > 0x || (num & (num - 1))) {
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 9e2763d7c159..59bd50f99291 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -105,9 +105,9 @@ struct vringh_kiov {
  /* Helpers for userspace vrings. */
  int vringh_init_user(struct vringh *vrh, u64 features,
 unsigned int num, bool weak_barriers,
-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used);
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used);
  
  static inline void vringh_iov_init(struct vringh_iov *iov,

   struct iovec *iovec, unsigned num)
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 9223c3a5c46a..b2c20f794472 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -86,6 +86,13 @@

Re: [PATCH v4] vhost: disable for OABI

2020-04-20 Thread Jason Wang


On 2020/4/20 下午10:34, Michael S. Tsirkin wrote:

vhost is currently broken on the some ARM configs.

The reason is that the ring element addresses are passed between
components with different alignments assumptions. Thus, if
guest selects a pointer and host then gets and dereferences
it, then alignment assumed by the host's compiler might be
greater than the actual alignment of the pointer.
compiler on the host from assuming pointer is aligned.

This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration. With this OABI, compiler assumes that
all structures are 4 byte aligned - which is stronger than
virtio guarantees for available and used rings, which are
merely 2 bytes. Thus a guest without -mabi=apcs-gnu running
on top of host with -mabi=apcs-gnu will be broken.

The correct fix is to force alignment of structures - however
that is an intrusive fix that's best deferred until the next release.

We didn't previously support such ancient systems at all - this surfaced
after vdpa support prompted removing dependency of vhost on
VIRTULIZATION. So for now, let's just add something along the lines of

depends on !ARM || AEABI

to the virtio Kconfig declaration, and add a comment that it has to do
with struct member alignment.

Note: we can't make VHOST and VHOST_RING themselves have
a dependency since these are selected. Add a new symbol for that.

We should be able to drop this dependency down the road.

Fixes: 20c384f1ea1a0bc7 ("vhost: refine vhost and vringh kconfig")
Suggested-by: Ard Biesheuvel 
Suggested-by: Richard Earnshaw 
Signed-off-by: Michael S. Tsirkin 
---

changes from v3:
update commit log clarifying the motivation and that
it's a temporary fix.

suggested by Christoph Hellwig

  drivers/misc/mic/Kconfig |  2 +-
  drivers/net/caif/Kconfig |  2 +-
  drivers/vdpa/Kconfig |  2 +-
  drivers/vhost/Kconfig| 17 +
  4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 8f201d019f5a..3bfe72c59864 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -116,7 +116,7 @@ config MIC_COSM
  
  config VOP

tristate "VOP Driver"
-   depends on VOP_BUS
+   depends on VOP_BUS && VHOST_DPN
select VHOST_RING
select VIRTIO
help
diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig
index 9db0570c5beb..661c25eb1c46 100644
--- a/drivers/net/caif/Kconfig
+++ b/drivers/net/caif/Kconfig
@@ -50,7 +50,7 @@ config CAIF_HSI
  
  config CAIF_VIRTIO

tristate "CAIF virtio transport driver"
-   depends on CAIF && HAS_DMA
+   depends on CAIF && HAS_DMA && VHOST_DPN
select VHOST_RING
select VIRTIO
select GENERIC_ALLOCATOR
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 3e1ceb8e9f2b..e8140065c8a5 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -10,7 +10,7 @@ if VDPA
  
  config VDPA_SIM

tristate "vDPA device simulator"
-   depends on RUNTIME_TESTING_MENU && HAS_DMA
+   depends on RUNTIME_TESTING_MENU && HAS_DMA && VHOST_DPN
select VHOST_RING
default n
help
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 2c75d164b827..c4f273793595 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -13,6 +13,15 @@ config VHOST_RING
  This option is selected by any driver which needs to access
  the host side of a virtio ring.
  
+config VHOST_DPN

+   bool
+   depends on !ARM || AEABI
+   default y
+   help
+ Anything selecting VHOST or VHOST_RING must depend on VHOST_DPN.
+ This excludes the deprecated ARM ABI since that forces a 4 byte
+ alignment on all structs - incompatible with virtio spec requirements.
+
  config VHOST
tristate
select VHOST_IOTLB
@@ -28,7 +37,7 @@ if VHOST_MENU
  
  config VHOST_NET

tristate "Host kernel accelerator for virtio net"
-   depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP)
+   depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) && VHOST_DPN
select VHOST
---help---
  This kernel module can be loaded in host kernel to accelerate
@@ -40,7 +49,7 @@ config VHOST_NET
  
  config VHOST_SCSI

tristate "VHOST_SCSI TCM fabric driver"
-   depends on TARGET_CORE && EVENTFD
+   depends on TARGET_CORE && EVENTFD && VHOST_DPN
select VHOST
default n
---help---
@@ -49,7 +58,7 @@ config VHOST_SCSI
  
  config VHOST_VSOCK

tristate "vhost virtio-vsock driver"
-   depends on VSOCKETS && EVENTFD
+   depends on VSOCKETS && EVENTFD && VHOST_DPN
select VHOST
select VIRTIO_VSOCKETS_COMMON
default n
@@ -63,7 +72,7 @@ config VHOST_VSOCK
  
  config VHOST_VDPA

tristate "Vhost driver for vDPA-based backend"
-   depends on EVENTFD
+   depends on EVENTFD && VHOS

[PATCH v3] virtio: force spec specified alignment on types

2020-04-20 Thread Michael S. Tsirkin
The ring element addresses are passed between components with different
alignments assumptions. Thus, if guest/userspace selects a pointer and
host then gets and dereferences it, we might need to decrease the
compiler-selected alignment to prevent compiler on the host from
assuming pointer is aligned.

This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration, but it seems safer to handle this
generally.

Note that userspace that allocates the memory is actually OK and does
not need to be fixed, but userspace that gets it from guest or another
process does need to be fixed. The later doesn't generally talk to the
kernel so while it might be buggy it's not talking to the kernel in the
buggy way - it's just using the header in the buggy way - so fixing
header and asking userspace to recompile is the best we can do.

I verified that the produced kernel binary on x86 is exactly identical
before and after the change.

Signed-off-by: Michael S. Tsirkin 
---

changes from v2:
add vring_used_elem_t to ensure alignment for substructures
changes from v1:
swicth all __user to the new typedefs

 drivers/vhost/vhost.c|  8 +++---
 drivers/vhost/vhost.h|  6 ++---
 drivers/vhost/vringh.c   |  6 ++---
 include/linux/vringh.h   |  6 ++---
 include/uapi/linux/virtio_ring.h | 43 
 5 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d450e16c5c25..bc77b0f465fd 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1244,9 +1244,9 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, 
u64 iova, int access)
 }
 
 static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used)
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used)
 
 {
return access_ok(desc, vhost_get_desc_size(vq, num)) &&
@@ -2301,7 +2301,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
struct vring_used_elem *heads,
unsigned count)
 {
-   struct vring_used_elem __user *used;
+   vring_used_elem_t __user *used;
u16 old, new;
int start;
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f8403bd46b85..60cab4c78229 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -67,9 +67,9 @@ struct vhost_virtqueue {
/* The actual ring of buffers. */
struct mutex mutex;
unsigned int num;
-   struct vring_desc __user *desc;
-   struct vring_avail __user *avail;
-   struct vring_used __user *used;
+   vring_desc_t __user *desc;
+   vring_avail_t __user *avail;
+   vring_used_t __user *used;
const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS];
struct file *kick;
struct eventfd_ctx *call_ctx;
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index ba8e0d6cfd97..e059a9a47cdf 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -620,9 +620,9 @@ static inline int xfer_to_user(const struct vringh *vrh,
  */
 int vringh_init_user(struct vringh *vrh, u64 features,
 unsigned int num, bool weak_barriers,
-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used)
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used)
 {
/* Sane power of 2 please! */
if (!num || num > 0x || (num & (num - 1))) {
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 9e2763d7c159..59bd50f99291 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -105,9 +105,9 @@ struct vringh_kiov {
 /* Helpers for userspace vrings. */
 int vringh_init_user(struct vringh *vrh, u64 features,
 unsigned int num, bool weak_barriers,
-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used);
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used);
 
 static inline void vringh_iov_init(struct vringh_iov *iov,
   struct iovec *iovec, unsigned num)
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 9223c3a5c46a..b2c20f794472 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -86,6 +86,13 @@
  * at the end of the used ring. Guest should ignore the used->flags fiel

[PATCH v2] virtio: force spec specified alignment on types

2020-04-20 Thread Michael S. Tsirkin
The ring element addresses are passed between components with different
alignments assumptions. Thus, if guest/userspace selects a pointer and
host then gets and dereferences it, we might need to decrease the
compiler-selected alignment to prevent compiler on the host from
assuming pointer is aligned.

This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration, but it seems safer to handle this
generally.

Note that userspace that allocates the memory is actually OK and does
not need to be fixed, but userspace that gets it from guest or another
process does need to be fixed. The later doesn't generally talk to the
kernel so while it might be buggy it's not talking to the kernel in the
buggy way - it's just using the header in the buggy way - so fixing
header and asking userspace to recompile is the best we can do.

I verified that the produced kernel binary on x86 is exactly identical
before and after the change.

Signed-off-by: Michael S. Tsirkin 
---

Changes from v1:
update vhost, vringh pointers to use the new typedefs

 drivers/vhost/vhost.c|  8 +++
 drivers/vhost/vhost.h|  6 ++---
 drivers/vhost/vringh.c   |  6 ++---
 include/linux/vringh.h   |  6 ++---
 include/uapi/linux/virtio_ring.h | 38 +++-
 5 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d450e16c5c25..21706759377e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1244,9 +1244,9 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, 
u64 iova, int access)
 }
 
 static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used)
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used)
 
 {
return access_ok(desc, vhost_get_desc_size(vq, num)) &&
@@ -2301,7 +2301,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
struct vring_used_elem *heads,
unsigned count)
 {
-   struct vring_used_elem __user *used;
+   vring_used_t __user *used;
u16 old, new;
int start;
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f8403bd46b85..60cab4c78229 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -67,9 +67,9 @@ struct vhost_virtqueue {
/* The actual ring of buffers. */
struct mutex mutex;
unsigned int num;
-   struct vring_desc __user *desc;
-   struct vring_avail __user *avail;
-   struct vring_used __user *used;
+   vring_desc_t __user *desc;
+   vring_avail_t __user *avail;
+   vring_used_t __user *used;
const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS];
struct file *kick;
struct eventfd_ctx *call_ctx;
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index ba8e0d6cfd97..e059a9a47cdf 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -620,9 +620,9 @@ static inline int xfer_to_user(const struct vringh *vrh,
  */
 int vringh_init_user(struct vringh *vrh, u64 features,
 unsigned int num, bool weak_barriers,
-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used)
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used)
 {
/* Sane power of 2 please! */
if (!num || num > 0x || (num & (num - 1))) {
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 9e2763d7c159..59bd50f99291 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -105,9 +105,9 @@ struct vringh_kiov {
 /* Helpers for userspace vrings. */
 int vringh_init_user(struct vringh *vrh, u64 features,
 unsigned int num, bool weak_barriers,
-struct vring_desc __user *desc,
-struct vring_avail __user *avail,
-struct vring_used __user *used);
+vring_desc_t __user *desc,
+vring_avail_t __user *avail,
+vring_used_t __user *used);
 
 static inline void vringh_iov_init(struct vringh_iov *iov,
   struct iovec *iovec, unsigned num)
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 9223c3a5c46a..177227f0d9cd 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -118,16 +118,6 @@ struct vring_used {
struct vring_used_elem ring[];
 };
 
-struct vring {
-   unsigned int num;
-
-   struct vring_desc *desc;
-

[GIT PULL v2] vhost: cleanups and fixes

2020-04-20 Thread Michael S. Tsirkin
The following changes since commit 8f3d9f354286745c751374f5f1fcafee6b3f3136:

  Linux 5.7-rc1 (2020-04-12 12:35:55 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to d085eb8ce727e581abf8145244eaa3339021be2f:

  vhost: disable for OABI (2020-04-20 10:19:22 -0400)

Changes from v1:
Dropped a bunch of cleanups which turned out to be controversial

This has been in next for a while, though I tweaked some commit
logs so the hashes differ.


virtio: fixes, cleanups

Some bug fixes.
Cleanup a couple of issues that surfaced meanwhile.
Disable vhost on ARM with OABI for now - to be fixed
fully later in the cycle or in the next release.

Signed-off-by: Michael S. Tsirkin 


Alexander Duyck (1):
  virtio-balloon: Avoid using the word 'report' when referring to free page 
hinting

Eugenio Pérez (1):
  vhost: Create accessors for virtqueues private_data

Gustavo A. R. Silva (1):
  vhost: vdpa: remove unnecessary null check

Jason Wang (1):
  vdpa: fix comment of vdpa_register_device()

Jason Yan (2):
  vhost: remove set but not used variable 'status'
  virtio-balloon: make virtballoon_free_page_report() static

Michael S. Tsirkin (15):
  vdpa-sim: depend on HAS_DMA
  virtio/test: fix up after IOTLB changes
  tools/virtio: define aligned attribute
  tools/virtio: make asm/barrier.h self contained
  virtgpu: pull in uaccess.h
  virtio-rng: pull in slab.h
  remoteproc: pull in slab.h
  virtio_input: pull in slab.h
  rpmsg: pull in slab.h
  remoteproc: pull in slab.h
  vdpa: allow a 32 bit vq alignment
  vdpa: make vhost, virtio depend on menu
  virtio_blk: add a missing include
  virtio: drop vringh.h dependency
  vhost: disable for OABI

Stephen Rothwell (1):
  drm/virtio: fix up for include file changes

YueHaibing (2):
  vdpa: remove unused variables 'ifcvf' and 'ifcvf_lm'
  vdpasim: Return status in vdpasim_get_status

 drivers/block/virtio_blk.c |  1 +
 drivers/char/hw_random/virtio-rng.c|  1 +
 drivers/gpu/drm/virtio/virtgpu_ioctl.c |  1 +
 drivers/gpu/drm/virtio/virtgpu_kms.c   |  1 +
 drivers/misc/mic/Kconfig   |  2 +-
 drivers/net/caif/Kconfig   |  2 +-
 drivers/remoteproc/remoteproc_sysfs.c  |  1 +
 drivers/remoteproc/stm32_rproc.c   |  1 +
 drivers/rpmsg/mtk_rpmsg.c  |  1 +
 drivers/vdpa/Kconfig   | 18 ++
 drivers/vdpa/ifcvf/ifcvf_base.c|  2 --
 drivers/vdpa/ifcvf/ifcvf_main.c|  4 +---
 drivers/vdpa/vdpa.c|  2 +-
 drivers/vdpa/vdpa_sim/vdpa_sim.c   |  4 ++--
 drivers/vhost/Kconfig  | 21 -
 drivers/vhost/net.c| 28 +++-
 drivers/vhost/scsi.c   | 14 +++---
 drivers/vhost/test.c   | 14 +++---
 drivers/vhost/vdpa.c   |  5 -
 drivers/vhost/vhost.h  | 27 +++
 drivers/vhost/vringh.c |  5 +
 drivers/vhost/vsock.c  | 14 +++---
 drivers/virtio/Kconfig |  2 +-
 drivers/virtio/virtio_balloon.c|  4 ++--
 drivers/virtio/virtio_input.c  |  1 +
 include/linux/vdpa.h   |  2 +-
 include/linux/virtio.h |  1 -
 include/linux/vringh.h |  6 ++
 include/uapi/linux/virtio_balloon.h| 11 +--
 tools/virtio/Makefile  |  5 +++--
 tools/virtio/asm/barrier.h |  1 +
 tools/virtio/generated/autoconf.h  |  0
 tools/virtio/linux/compiler.h  |  1 +
 33 files changed, 128 insertions(+), 75 deletions(-)
 create mode 100644 tools/virtio/generated/autoconf.h

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled

2020-04-20 Thread Alexander Duyck
On Mon, Apr 20, 2020 at 6:28 AM David Hildenbrand  wrote:
>
> >>> Now we are talking about what's safe to do with the page.
> >>>
> >>> If POISON flag is set by hypervisor but clear by guest,
> >>> or poison_val is 0, then it's clear it's safe to blow
> >>> away the page if we can figure out it's unused.
> >>>
> >>> Otherwise, it's much less clear :)
> >>
> >> Hah! Agreed :D
> >
> > That isn't quite true. The problem is in the case of hinting it isn't
> > setting the page to 0. It is simply not migrating it. So if there is
> > data from an earlier pass it is stuck at that value. So the balloon
> > will see the poison/init on some pages cleared, however I suppose the
> > balloon doesn't care about the contents of the page. For the pages
> > that are leaked back out via the shrinker they will be dirtied so they
> > will end up being migrated with the correct value eventually.
>
> Right, I think current Linux guests are fine. The critical piece we are
> talking about is
>
> 1) Guest balloon allocates and hints a page
> 2) Hypervisor does not process hinting request
> 3) Guest frees the page and reuses it (especially, writes it).
> 4) Hypervisor processes the hinting request.
>
> AFAIU, as soon as the guest writes the page (e.g., zeroes it/poisons it
> in the buddy, or somebody who allocated it), the page *will* get
> migrated, even if 4) happens after 3). That's guaranteed by the 2-bitmap
> magic.

Yes. Basically the page will be flagged as dirty as soon as it is
written to, and that will be dealt with after the current batch of
hints are processed so we should be able to guarantee that the page
will have a coherent value stored in it.

> Now, assume the following happens (in some future Linux version) (due to
> your "simply not migrating it" comment):
>
> 1) Guest balloon allocates and hints a page. Assume the page is zero due
> to want_init_on_free().
> 2) Hypervisor processes the hinting request.
> 3) Guest frees the page. Assume we are implementing some magic to "skip"
> zeroing, as we assume it is still zero.
>
> Due to 2), the page won't get migrated. In 3) we expect the page to be
> 0. QEMU would have to make sure that we always get either the original,
> or a zero page on the destination. Otherwise, this smells like data
> corruption.

I agree and that is my concern. From the time the page is hinted until
it is written to again it is in an indeterminate state. However with
the current Linux guest implementation it is being written to so
technically there isn't an issue. We would just need to make sure it
stays that way.

In addition it is already an exposed interface and this is the way it
works. I think if anything we are likely forced to document it as-is
and guarantee that the behavior is not changed.

> >
> >>> I'll have to come back and re-read the rest next week, this
> >>> is complex stuff and I'm too rushed with other things today.
> >>
> >> Yeah, I'm also loaded with other stuff. Maybe Alex has time to
> >> understand the details as well.
> >
> > So after looking it over again it makes a bit more sense why this
> > hasn't caused any issues so far, and I can see why the poison enabled
> > setup and hinting can work. The problem is I am left wondering what
> > assumptions we are allowed to leave in place.
> >
> > 1. Can we assume that we don't care about the contents in the pages in
> > the balloon changing?
>
> I think, we should define valid ways for the hypervisor to change it.
>
> "Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by
> a zero page. However, as soon as the page is written by the guest (even
> before the hinting request was processed by the host), the modified page
> will stay - whereby the unwritten parts might either be from the old, or
> from the zero page."

Actually I don't think the zero page is accurate. A page that is
hinted basically implies we don't care about the content. I would
think that we could treat a hinted page as uninitialized memory.

> I think the debatable part is "whereby the unwritten parts might either
> be from the old, or from the zero page". AFAIU, you think it could
> happen in QEMU, that we have neither the old, nor the zero page, but
> instead some previous content. The question is if that's valid, or if
> that's a BUG in QEMU. If it's valid, we can do no optimizations in the
> guest (e.g., skip zeroing in the buddy). And I agree that this smells
> like "data corruption" as Michael said.

So if any part of the page is written to after it is hinted you will
be getting the modified page since it goes back to being "dirty". All
the hinting is doing is skipping the migration of dirty pages that are
"hinted" as long as they are not written to again. So the pages are
valid up until we migrate to the new system. At that point all of the
pages that are in the page hinting balloon will be stale data as we
skipped migrating them. Assuming something like page poisoning or
init_on_free are enabled they will be poisoned again when they are

[PATCH v4] vhost: disable for OABI

2020-04-20 Thread Michael S. Tsirkin
vhost is currently broken on the some ARM configs.

The reason is that the ring element addresses are passed between
components with different alignments assumptions. Thus, if
guest selects a pointer and host then gets and dereferences
it, then alignment assumed by the host's compiler might be
greater than the actual alignment of the pointer.
compiler on the host from assuming pointer is aligned.

This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration. With this OABI, compiler assumes that
all structures are 4 byte aligned - which is stronger than
virtio guarantees for available and used rings, which are
merely 2 bytes. Thus a guest without -mabi=apcs-gnu running
on top of host with -mabi=apcs-gnu will be broken.

The correct fix is to force alignment of structures - however
that is an intrusive fix that's best deferred until the next release.

We didn't previously support such ancient systems at all - this surfaced
after vdpa support prompted removing dependency of vhost on
VIRTULIZATION. So for now, let's just add something along the lines of

depends on !ARM || AEABI

to the virtio Kconfig declaration, and add a comment that it has to do
with struct member alignment.

Note: we can't make VHOST and VHOST_RING themselves have
a dependency since these are selected. Add a new symbol for that.

We should be able to drop this dependency down the road.

Fixes: 20c384f1ea1a0bc7 ("vhost: refine vhost and vringh kconfig")
Suggested-by: Ard Biesheuvel 
Suggested-by: Richard Earnshaw 
Signed-off-by: Michael S. Tsirkin 
---

changes from v3:
update commit log clarifying the motivation and that
it's a temporary fix.

suggested by Christoph Hellwig

 drivers/misc/mic/Kconfig |  2 +-
 drivers/net/caif/Kconfig |  2 +-
 drivers/vdpa/Kconfig |  2 +-
 drivers/vhost/Kconfig| 17 +
 4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 8f201d019f5a..3bfe72c59864 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -116,7 +116,7 @@ config MIC_COSM
 
 config VOP
tristate "VOP Driver"
-   depends on VOP_BUS
+   depends on VOP_BUS && VHOST_DPN
select VHOST_RING
select VIRTIO
help
diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig
index 9db0570c5beb..661c25eb1c46 100644
--- a/drivers/net/caif/Kconfig
+++ b/drivers/net/caif/Kconfig
@@ -50,7 +50,7 @@ config CAIF_HSI
 
 config CAIF_VIRTIO
tristate "CAIF virtio transport driver"
-   depends on CAIF && HAS_DMA
+   depends on CAIF && HAS_DMA && VHOST_DPN
select VHOST_RING
select VIRTIO
select GENERIC_ALLOCATOR
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 3e1ceb8e9f2b..e8140065c8a5 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -10,7 +10,7 @@ if VDPA
 
 config VDPA_SIM
tristate "vDPA device simulator"
-   depends on RUNTIME_TESTING_MENU && HAS_DMA
+   depends on RUNTIME_TESTING_MENU && HAS_DMA && VHOST_DPN
select VHOST_RING
default n
help
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 2c75d164b827..c4f273793595 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -13,6 +13,15 @@ config VHOST_RING
  This option is selected by any driver which needs to access
  the host side of a virtio ring.
 
+config VHOST_DPN
+   bool
+   depends on !ARM || AEABI
+   default y
+   help
+ Anything selecting VHOST or VHOST_RING must depend on VHOST_DPN.
+ This excludes the deprecated ARM ABI since that forces a 4 byte
+ alignment on all structs - incompatible with virtio spec requirements.
+
 config VHOST
tristate
select VHOST_IOTLB
@@ -28,7 +37,7 @@ if VHOST_MENU
 
 config VHOST_NET
tristate "Host kernel accelerator for virtio net"
-   depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP)
+   depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) && VHOST_DPN
select VHOST
---help---
  This kernel module can be loaded in host kernel to accelerate
@@ -40,7 +49,7 @@ config VHOST_NET
 
 config VHOST_SCSI
tristate "VHOST_SCSI TCM fabric driver"
-   depends on TARGET_CORE && EVENTFD
+   depends on TARGET_CORE && EVENTFD && VHOST_DPN
select VHOST
default n
---help---
@@ -49,7 +58,7 @@ config VHOST_SCSI
 
 config VHOST_VSOCK
tristate "vhost virtio-vsock driver"
-   depends on VSOCKETS && EVENTFD
+   depends on VSOCKETS && EVENTFD && VHOST_DPN
select VHOST
select VIRTIO_VSOCKETS_COMMON
default n
@@ -63,7 +72,7 @@ config VHOST_VSOCK
 
 config VHOST_VDPA
tristate "Vhost driver for vDPA-based backend"
-   depends on EVENTFD
+   depends on EVENTFD && VHOST_DPN
select VHOST
depends on VDPA
help
-- 
MST

___

Re: [PATCH v3] vhost: disable for OABI

2020-04-20 Thread Michael S. Tsirkin
On Mon, Apr 20, 2020 at 01:29:09AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 16, 2020 at 06:20:20PM -0400, Michael S. Tsirkin wrote:
> > vhost is currently broken on the some ARM configs.
> > 
> > The reason is that that uses apcs-gnu which is the ancient OABI that is been
> > deprecated for a long time.
> > 
> > Given that virtio support on such ancient systems is not needed in the
> > first place, let's just add something along the lines of
> > 
> > depends on !ARM || AEABI
> > 
> > to the virtio Kconfig declaration, and add a comment that it has to do
> > with struct member alignment.
> > 
> > Note: we can't make VHOST and VHOST_RING themselves have
> > a dependency since these are selected. Add a new symbol for that.
> 
> This description is horrible.  The only interesting thing for ARM OABI
> is that it has some strange padding rules, but that isn't something
> that can't be handled.   Please spend some time looking into the issue
> and add te proper __padded annotations, we've done that elsewhere in
> the kernel and it isn't too bad - in fact it helps understanding issues
> with implicit alignment.

Yes I have a patch queued to fix it. I wanted a minimal patch for this
release though.

> And even if you have a good reason not to fix vhost (which I think you
> don't have) this changelog is just utter crap, as it fails to mention
> what the problem with ARM OABI even is.

I'll tweak that, thanks!

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH] virtio-balloon: Disable free page hinting/reporting if page poison is disabled

2020-04-20 Thread David Hildenbrand
>>> Now we are talking about what's safe to do with the page.
>>>
>>> If POISON flag is set by hypervisor but clear by guest,
>>> or poison_val is 0, then it's clear it's safe to blow
>>> away the page if we can figure out it's unused.
>>>
>>> Otherwise, it's much less clear :)
>>
>> Hah! Agreed :D
> 
> That isn't quite true. The problem is in the case of hinting it isn't
> setting the page to 0. It is simply not migrating it. So if there is
> data from an earlier pass it is stuck at that value. So the balloon
> will see the poison/init on some pages cleared, however I suppose the
> balloon doesn't care about the contents of the page. For the pages
> that are leaked back out via the shrinker they will be dirtied so they
> will end up being migrated with the correct value eventually.

Right, I think current Linux guests are fine. The critical piece we are
talking about is

1) Guest balloon allocates and hints a page
2) Hypervisor does not process hinting request
3) Guest frees the page and reuses it (especially, writes it).
4) Hypervisor processes the hinting request.

AFAIU, as soon as the guest writes the page (e.g., zeroes it/poisons it
in the buddy, or somebody who allocated it), the page *will* get
migrated, even if 4) happens after 3). That's guaranteed by the 2-bitmap
magic.



Now, assume the following happens (in some future Linux version) (due to
your "simply not migrating it" comment):

1) Guest balloon allocates and hints a page. Assume the page is zero due
to want_init_on_free().
2) Hypervisor processes the hinting request.
3) Guest frees the page. Assume we are implementing some magic to "skip"
zeroing, as we assume it is still zero.

Due to 2), the page won't get migrated. In 3) we expect the page to be
0. QEMU would have to make sure that we always get either the original,
or a zero page on the destination. Otherwise, this smells like data
corruption.


> 
>>> I'll have to come back and re-read the rest next week, this
>>> is complex stuff and I'm too rushed with other things today.
>>
>> Yeah, I'm also loaded with other stuff. Maybe Alex has time to
>> understand the details as well.
> 
> So after looking it over again it makes a bit more sense why this
> hasn't caused any issues so far, and I can see why the poison enabled
> setup and hinting can work. The problem is I am left wondering what
> assumptions we are allowed to leave in place.
> 
> 1. Can we assume that we don't care about the contents in the pages in
> the balloon changing?

I think, we should define valid ways for the hypervisor to change it.

"Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by
a zero page. However, as soon as the page is written by the guest (even
before the hinting request was processed by the host), the modified page
will stay - whereby the unwritten parts might either be from the old, or
from the zero page."

I think the debatable part is "whereby the unwritten parts might either
be from the old, or from the zero page". AFAIU, you think it could
happen in QEMU, that we have neither the old, nor the zero page, but
instead some previous content. The question is if that's valid, or if
that's a BUG in QEMU. If it's valid, we can do no optimizations in the
guest (e.g., skip zeroing in the buddy). And I agree that this smells
like "data corruption" as Michael said.


> 2. Can we assume that the guest will always rewrite the page after the
> deflate in the case of init_on_free or poison?

Depends on what we think is the right way to do - IOW if we think "some
other content" as mentioned above is a BUG or not.

> 3. Can we assume that free page hinting will always function as a
> balloon setup, so no moving it over to a page reporting type setup?

I think we have to define the valid semantics. That implies what would
be valid to do with it. Unfortunately, we have to reverse-engineer here.

> 
> If we assume the above 3 items then there isn't any point in worrying
> about poison when it comes to free page hinting. It doesn't make sense
> to since they essentially negate it. As such I could go through this
> patch and the QEMU patches and clean up any associations since the to
> are not really tied together in any way.

The big question is, if we want to support VIRTIO_BALLOON_F_PAGE_POISON
with free page hinting. e.g.,:

"Pages hinted via VIRTIO_BALLOON_F_FREE_PAGE_HINT might get replaced by
a page full of X. However, as soon as the page is written by the guest
(even before the hinting request was processed by the host), the
modified page will stay - whereby the unwritten parts might either be
from the old, or from a page filled with X. Without
VIRTIO_BALLOON_F_PAGE_POISON, X is zero, otherwise it is poison_val."

The current QEMU implementation would be to simply migrate all hinted
pages. In the future we could optimize. Not sure if it's worth the trouble.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foun

Re: [PATCH v3] vhost: disable for OABI

2020-04-20 Thread Christoph Hellwig
On Thu, Apr 16, 2020 at 06:20:20PM -0400, Michael S. Tsirkin wrote:
> vhost is currently broken on the some ARM configs.
> 
> The reason is that that uses apcs-gnu which is the ancient OABI that is been
> deprecated for a long time.
> 
> Given that virtio support on such ancient systems is not needed in the
> first place, let's just add something along the lines of
> 
>   depends on !ARM || AEABI
> 
> to the virtio Kconfig declaration, and add a comment that it has to do
> with struct member alignment.
> 
> Note: we can't make VHOST and VHOST_RING themselves have
> a dependency since these are selected. Add a new symbol for that.

This description is horrible.  The only interesting thing for ARM OABI
is that it has some strange padding rules, but that isn't something
that can't be handled.   Please spend some time looking into the issue
and add te proper __padded annotations, we've done that elsewhere in
the kernel and it isn't too bad - in fact it helps understanding issues
with implicit alignment.

And even if you have a good reason not to fix vhost (which I think you
don't have) this changelog is just utter crap, as it fails to mention
what the problem with ARM OABI even is.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization