[PATCH v2 0/9] x86: Concurrent TLB flushes

2019-07-03 Thread Nadav Amit via Virtualization
Currently, local and remote TLB flushes are not performed concurrently,
which introduces unnecessary overhead - each INVLPG can take 100s of
cycles. This patch-set allows TLB flushes to be run concurrently: first
request the remote CPUs to initiate the flush, then run it locally, and
finally wait for the remote CPUs to finish their work.

In addition, there are various small optimizations to avoid unwarranted
false-sharing and atomic operations.

The proposed changes should also improve the performance of other
invocations of on_each_cpu(). Hopefully, no one has relied on this
behavior of on_each_cpu() that invoked functions first remotely and only
then locally [Peter says he remembers someone might do so, but without
further information it is hard to know how to address it].

Running sysbench on dax/ext4 w/emulated-pmem, write-cache disabled on
2-socket, 48-logical-cores (24+SMT) Haswell-X, 5 repetitions:

 sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
  --file-io-mode=mmap --threads=X --file-fsync-mode=fdatasync run

  Th.   tip-jun28 avg (stdev)   +patch-set avg (stdev)  change
  ---   -   --  --
  1 1267765 (14146) 1299253 (5715)  +2.4%
  2 1734644 (11936) 1799225 (19577) +3.7%
  4 2821268 (41184) 2919132 (40149) +3.4%
  8 4171652 (31243) 4376925 (65416) +4.9%
  165590729 (24160) 5829866 (8127)  +4.2%
  246250212 (24481) 6522303 (28044) +4.3%
  323994314 (26606) 4077543 (10685) +2.0%
  484345177 (28091) 4417821 (41337) +1.6%

(Note that on configurations with up to 24 threads numactl was used to
set all threads on socket 1, which explains the drop in performance when
going to 32 threads).

Running the same benchmark with security mitigations disabled (PTI,
Spectre, MDS):

  Th.   tip-jun28 avg (stdev)   +patch-set avg (stdev)  change
  ---   -   --  --
  1 1598896 (5174)  1607903 (4091)  +0.5%
  2 2109472 (17827) 2224726 (4372)  +5.4%
  4 3448587 (11952) 3668551 (30219) +6.3%
  8 5425778 (29641) 5606266 (33519) +3.3%
  166931232 (34677) 7054052 (27873) +1.7%
  247612473 (23482) 7783138 (13871) +2.2%
  324296274 (18029) 4283279 (32323) -0.3%
  484770029 (35541) 4764760 (13575) -0.1%

Presumably, PTI requires two invalidations of each mapping, which allows
to get higher benefits from concurrency when PTI is on. At the same
time, when mitigations are on, other overheads reduce the potential
speedup.

I tried to reduce the size of the code of the main patch, which required
restructuring of the series.

v1 -> v2:
* Removing the patches that Thomas took [tglx]
* Adding hyper-v, Xen compile-tested implementations [Dave]
* Removing UV [Andy]
* Adding lazy optimization, removing inline keyword [Dave]
* Restructuring patch-set

RFCv2 -> v1:
* Fix comment on flush_tlb_multi [Juergen]
* Removing async invalidation optimizations [Andy]
* Adding KVM support [Paolo]

Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Boris Ostrovsky 
Cc: Dave Hansen 
Cc: Haiyang Zhang 
Cc: Ingo Molnar 
Cc: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: "K. Y. Srinivasan" 
Cc: Paolo Bonzini 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Sasha Levin 
Cc: Stephen Hemminger 
Cc: Thomas Gleixner 
Cc: k...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: x...@kernel.org
Cc: xen-de...@lists.xenproject.org

Nadav Amit (9):
  smp: Run functions concurrently in smp_call_function_many()
  x86/mm/tlb: Remove reason as argument for flush_tlb_func_local()
  x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
  x86/mm/tlb: Flush remote and local TLBs concurrently
  x86/mm/tlb: Privatize cpu_tlbstate
  x86/mm/tlb: Do not make is_lazy dirty for no reason
  cpumask: Mark functions as pure
  x86/mm/tlb: Remove UV special case
  x86/mm/tlb: Remove unnecessary uses of the inline keyword

 arch/x86/hyperv/mmu.c |  13 ++-
 arch/x86/include/asm/paravirt.h   |   6 +-
 arch/x86/include/asm/paravirt_types.h |   4 +-
 arch/x86/include/asm/tlbflush.h   |  48 +
 arch/x86/include/asm/trace/hyperv.h   |   2 +-
 arch/x86/kernel/kvm.c |  11 +-
 arch/x86/kernel/paravirt.c|   2 +-
 arch/x86/mm/init.c|   2 +-
 arch/x86/mm/tlb.c | 147 --
 arch/x86/xen/mmu_pv.c |  20 ++--
 include/linux/cpumask.h   |   6 +-
 include/linux/smp.h   |  27 +++--
 include/trace/events/xen.h|   2 +-
 kernel/smp.c  | 133 +++
 14 files changed, 245 insertions(+), 178 deletions(-)

[PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

2019-07-03 Thread Nadav Amit via Virtualization
To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).

While the updated smp infrastructure is capable of running a function on
a single local core, it is not optimized for this case. The multiple
function calls and the indirect branch introduce some overhead, and
might make local TLB flushes slower than they were before the recent
changes.

Before calling the SMP infrastructure, check if only a local TLB flush
is needed to restore the lost performance in this common case. This
requires to check mm_cpumask() one more time, but unless this mask is
updated very frequently, this should impact performance negatively.

Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Sasha Levin 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: Juergen Gross 
Cc: Paolo Bonzini 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Boris Ostrovsky 
Cc: linux-hyp...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: k...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org
Signed-off-by: Nadav Amit 
---
 arch/x86/hyperv/mmu.c | 13 +++---
 arch/x86/include/asm/paravirt.h   |  6 +--
 arch/x86/include/asm/paravirt_types.h |  4 +-
 arch/x86/include/asm/tlbflush.h   |  9 ++--
 arch/x86/include/asm/trace/hyperv.h   |  2 +-
 arch/x86/kernel/kvm.c | 11 +++--
 arch/x86/kernel/paravirt.c|  2 +-
 arch/x86/mm/tlb.c | 65 ---
 arch/x86/xen/mmu_pv.c | 20 ++---
 include/trace/events/xen.h|  2 +-
 10 files changed, 91 insertions(+), 43 deletions(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index e65d7fe6489f..1177f863e4cd 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -50,8 +50,8 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
return gva_n - offset;
 }
 
-static void hyperv_flush_tlb_others(const struct cpumask *cpus,
-   const struct flush_tlb_info *info)
+static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
+  const struct flush_tlb_info *info)
 {
int cpu, vcpu, gva_n, max_gvas;
struct hv_tlb_flush **flush_pcpu;
@@ -59,7 +59,7 @@ static void hyperv_flush_tlb_others(const struct cpumask 
*cpus,
u64 status = U64_MAX;
unsigned long flags;
 
-   trace_hyperv_mmu_flush_tlb_others(cpus, info);
+   trace_hyperv_mmu_flush_tlb_multi(cpus, info);
 
if (!hv_hypercall_pg)
goto do_native;
@@ -69,6 +69,9 @@ static void hyperv_flush_tlb_others(const struct cpumask 
*cpus,
 
local_irq_save(flags);
 
+   if (cpumask_test_cpu(smp_processor_id(), cpus))
+   flush_tlb_func_local(info);
+
flush_pcpu = (struct hv_tlb_flush **)
 this_cpu_ptr(hyperv_pcpu_input_arg);
 
@@ -156,7 +159,7 @@ static void hyperv_flush_tlb_others(const struct cpumask 
*cpus,
if (!(status & HV_HYPERCALL_RESULT_MASK))
return;
 do_native:
-   native_flush_tlb_others(cpus, info);
+   native_flush_tlb_multi(cpus, info);
 }
 
 static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
@@ -231,6 +234,6 @@ void hyperv_setup_mmu_ops(void)
return;
 
pr_info("Using hypercall for remote TLB flush\n");
-   pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others;
+   pv_ops.mmu.flush_tlb_multi = hyperv_flush_tlb_multi;
pv_ops.mmu.tlb_remove_table = tlb_remove_table;
 }
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c25c38a05c1c..316959e89258 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -62,10 +62,10 @@ static inline void __flush_tlb_one_user(unsigned long addr)
PVOP_VCALL1(mmu.flush_tlb_one_user, addr);
 }
 
-static inline void flush_tlb_others(const struct cpumask *cpumask,
-   const struct flush_tlb_info *info)
+static inline void flush_tlb_multi(const struct cpumask *cpumask,
+  const struct flush_tlb_info *info)
 {
-   PVOP_VCALL2(mmu.flush_tlb_others, cpumask, info);
+   PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info);
 }
 
 static inline void paravirt_tlb_remove_table(struct mmu_gather *tlb, void 
*table)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 946f8f1f1efc..54f4c718b5b0 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -211,8 +211,8 @@ struct pv_mmu_ops {
void (*flush_tlb_user)(void);
void (*flush_tlb_kernel)(void);
void (*flush_tlb_one_user)(unsigned long addr);
-   vo

[PATCH 3/5] drm/ast: Replace struct ast_fbdev with generic framebuffer emulation

2019-07-03 Thread Thomas Zimmermann
This patch replaces ast's framebuffer console with DRM's generic
implememtation. All respective code is being removed from the driver.

The console is set up with a shadow buffer. The actual buffer object is
not permanently pinned in video ram, but just another buffer object that
the driver moves in and out of vram as necessary. The driver's function
ast_crtc_do_set_base() used to contain special handling for the framebuffer
console. With the new generic framebuffer, the driver does not need this
code an longer.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/ast/Makefile   |   2 +-
 drivers/gpu/drm/ast/ast_drv.c  |  22 ++-
 drivers/gpu/drm/ast/ast_drv.h  |  17 --
 drivers/gpu/drm/ast/ast_fb.c   | 341 -
 drivers/gpu/drm/ast/ast_main.c |  30 ++-
 drivers/gpu/drm/ast/ast_mode.c |  21 --
 6 files changed, 41 insertions(+), 392 deletions(-)
 delete mode 100644 drivers/gpu/drm/ast/ast_fb.c

diff --git a/drivers/gpu/drm/ast/Makefile b/drivers/gpu/drm/ast/Makefile
index b086dae17013..561f7c4199e4 100644
--- a/drivers/gpu/drm/ast/Makefile
+++ b/drivers/gpu/drm/ast/Makefile
@@ -3,6 +3,6 @@
 # Makefile for the drm device driver.  This driver provides support for the
 # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
 
-ast-y := ast_drv.o ast_main.o ast_mode.o ast_fb.o ast_ttm.o ast_post.o 
ast_dp501.o
+ast-y := ast_drv.o ast_main.o ast_mode.o ast_ttm.o ast_post.o ast_dp501.o
 
 obj-$(CONFIG_DRM_AST) := ast.o
diff --git a/drivers/gpu/drm/ast/ast_drv.c b/drivers/gpu/drm/ast/ast_drv.c
index 3811997e78c4..75f55ac1a218 100644
--- a/drivers/gpu/drm/ast/ast_drv.c
+++ b/drivers/gpu/drm/ast/ast_drv.c
@@ -103,25 +103,29 @@ static int ast_drm_freeze(struct drm_device *dev)
 
pci_save_state(dev->pdev);
 
-   console_lock();
-   ast_fbdev_set_suspend(dev, 1);
-   console_unlock();
+   if (dev->fb_helper) {
+   console_lock();
+   drm_fb_helper_set_suspend(dev->fb_helper, 1);
+   console_unlock();
+   }
+
return 0;
 }
 
 static int ast_drm_thaw(struct drm_device *dev)
 {
-   int error = 0;
-
ast_post_gpu(dev);
 
drm_mode_config_reset(dev);
drm_helper_resume_force_mode(dev);
 
-   console_lock();
-   ast_fbdev_set_suspend(dev, 0);
-   console_unlock();
-   return error;
+   if (dev->fb_helper) {
+   console_lock();
+   drm_fb_helper_set_suspend(dev->fb_helper, 0);
+   console_unlock();
+   }
+
+   return 0;
 }
 
 static int ast_drm_resume(struct drm_device *dev)
diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h
index ca794a3fcf8f..907d7480b7ba 100644
--- a/drivers/gpu/drm/ast/ast_drv.h
+++ b/drivers/gpu/drm/ast/ast_drv.h
@@ -81,8 +81,6 @@ enum ast_tx_chip {
 #define AST_DRAM_4Gx16   7
 #define AST_DRAM_8Gx16   8
 
-struct ast_fbdev;
-
 struct ast_private {
struct drm_device *dev;
 
@@ -96,8 +94,6 @@ struct ast_private {
uint32_t mclk;
uint32_t vram_size;
 
-   struct ast_fbdev *fbdev;
-
int fb_mtrr;
 
struct drm_gem_object *cursor_cache;
@@ -239,14 +235,6 @@ struct ast_encoder {
struct drm_encoder base;
 };
 
-struct ast_fbdev {
-   struct drm_fb_helper helper; /* must be first */
-   void *sysram;
-   int size;
-   int x1, y1, x2, y2; /* dirty rect */
-   spinlock_t dirty_lock;
-};
-
 #define to_ast_crtc(x) container_of(x, struct ast_crtc, base)
 #define to_ast_connector(x) container_of(x, struct ast_connector, base)
 #define to_ast_encoder(x) container_of(x, struct ast_encoder, base)
@@ -289,11 +277,6 @@ struct ast_vbios_mode_info {
 extern int ast_mode_init(struct drm_device *dev);
 extern void ast_mode_fini(struct drm_device *dev);
 
-int ast_fbdev_init(struct drm_device *dev);
-void ast_fbdev_fini(struct drm_device *dev);
-void ast_fbdev_set_suspend(struct drm_device *dev, int state);
-void ast_fbdev_set_base(struct ast_private *ast, unsigned long gpu_addr);
-
 #define AST_MM_ALIGN_SHIFT 4
 #define AST_MM_ALIGN_MASK ((1 << AST_MM_ALIGN_SHIFT) - 1)
 
diff --git a/drivers/gpu/drm/ast/ast_fb.c b/drivers/gpu/drm/ast/ast_fb.c
deleted file mode 100644
index a849e58b40bc..
--- a/drivers/gpu/drm/ast/ast_fb.c
+++ /dev/null
@@ -1,341 +0,0 @@
-/*
- * Copyright 2012 Red Hat Inc.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the
- * "Software"), to deal in the Software without restriction, including
- * without limitation the rights to use, copy, modify, merge, publish,
- * distribute, sub license, and/or sell copies of the Software, and to
- * permit persons to whom the Software is furnished to do so, subject to
- * the following conditions:
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A P

[PATCH 1/5] drm/client: Support unmapping of DRM client buffers

2019-07-03 Thread Thomas Zimmermann
DRM clients, such as the fbdev emulation, have their buffer objects
mapped by default. Mapping a buffer implicitly prevents its relocation.
Hence, the buffer may permanently consume video memory while it's
allocated. This is a problem for drivers of low-memory devices, such as
ast, mgag200 or older framebuffer hardware, which will then not have
enough memory to display other content (e.g., X11).

This patch introduces drm_client_buffer_vmap() and _vunmap(). Internal
DRM clients can use these functions to unmap and remap buffer objects
as needed.

There's no reference counting for vmap operations. Callers are expected
to either keep buffers mapped (as it is now), or call vmap and vunmap
in pairs around code that accesses the mapped memory.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_client.c | 71 +++-
 include/drm/drm_client.h |  3 ++
 2 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
index 410572f14257..d04660c4470a 100644
--- a/drivers/gpu/drm/drm_client.c
+++ b/drivers/gpu/drm/drm_client.c
@@ -235,7 +235,8 @@ static void drm_client_buffer_delete(struct 
drm_client_buffer *buffer)
 {
struct drm_device *dev = buffer->client->dev;
 
-   drm_gem_vunmap(buffer->gem, buffer->vaddr);
+   if (buffer->vaddr)
+   drm_gem_vunmap(buffer->gem, buffer->vaddr);
 
if (buffer->gem)
drm_gem_object_put_unlocked(buffer->gem);
@@ -281,6 +282,43 @@ drm_client_buffer_create(struct drm_client_dev *client, 
u32 width, u32 height, u
 
buffer->gem = obj;
 
+   vaddr = drm_client_buffer_vmap(buffer);
+   if (IS_ERR(vaddr)) {
+   ret = PTR_ERR(vaddr);
+   goto err_delete;
+   }
+
+   return buffer;
+
+err_delete:
+   drm_client_buffer_delete(buffer);
+
+   return ERR_PTR(ret);
+}
+
+/**
+ * drm_client_buffer_vmap - Map DRM client buffer into address space
+ * @buffer: DRM client buffer
+ *
+ * This function maps a client buffer into kernel address space. If the
+ * buffer is already mapped, it returns the mapping's address.
+ *
+ * Client buffer mappings are not ref'counted. Each call to
+ * drm_client_buffer_vmap() should be followed by a call to
+ * drm_client_buffer_vunmap(); or the client buffer should be mapped
+ * throughout its lifetime. The latter is the default.
+ *
+ * Returns:
+ * The mapped memory's address
+ */
+void *
+drm_client_buffer_vmap(struct drm_client_buffer *buffer)
+{
+   void *vaddr;
+
+   if (buffer->vaddr)
+   return buffer->vaddr;
+
/*
 * FIXME: The dependency on GEM here isn't required, we could
 * convert the driver handle to a dma-buf instead and use the
@@ -289,21 +327,34 @@ drm_client_buffer_create(struct drm_client_dev *client, 
u32 width, u32 height, u
 * fd_install step out of the driver backend hooks, to make that
 * final step optional for internal users.
 */
-   vaddr = drm_gem_vmap(obj);
-   if (IS_ERR(vaddr)) {
-   ret = PTR_ERR(vaddr);
-   goto err_delete;
-   }
+   vaddr = drm_gem_vmap(buffer->gem);
+   if (IS_ERR(vaddr))
+   return vaddr;
 
buffer->vaddr = vaddr;
 
-   return buffer;
+   return vaddr;
+}
+EXPORT_SYMBOL(drm_client_buffer_vmap);
 
-err_delete:
-   drm_client_buffer_delete(buffer);
+/**
+ * drm_client_buffer_vunmap - Unmap DRM client buffer
+ * @buffer: DRM client buffer
+ *
+ * This function removes a client buffer's memory mmapping. This
+ * function is only required by clients that manage their buffers
+ * by themselves. By default, DRM client buffers are mapped throughout
+ * their entire lifetime.
+ */
+void drm_client_buffer_vunmap(struct drm_client_buffer *buffer)
+{
+   if (!buffer->vaddr)
+   return;
 
-   return ERR_PTR(ret);
+   drm_gem_vunmap(buffer->gem, buffer->vaddr);
+   buffer->vaddr = NULL;
 }
+EXPORT_SYMBOL(drm_client_buffer_vunmap);
 
 static void drm_client_buffer_rmfb(struct drm_client_buffer *buffer)
 {
diff --git a/include/drm/drm_client.h b/include/drm/drm_client.h
index 72d51d1e9dd9..e1db1d9da0bf 100644
--- a/include/drm/drm_client.h
+++ b/include/drm/drm_client.h
@@ -149,6 +149,9 @@ struct drm_client_buffer {
 struct drm_client_buffer *
 drm_client_framebuffer_create(struct drm_client_dev *client, u32 width, u32 
height, u32 format);
 void drm_client_framebuffer_delete(struct drm_client_buffer *buffer);
+void *
+drm_client_buffer_vmap(struct drm_client_buffer *buffer);
+void drm_client_buffer_vunmap(struct drm_client_buffer *buffer);
 
 int drm_client_modeset_create(struct drm_client_dev *client);
 void drm_client_modeset_free(struct drm_client_dev *client);
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualiza

[PATCH 4/5] drm/bochs: Use shadow buffer for bochs framebuffer console

2019-07-03 Thread Thomas Zimmermann
The bochs driver (and virtual hardware) requires buffer objects to
reside in video ram to display them to the screen. So it can not
display the framebuffer console because the respective buffer object
is permanently pinned in system memory.

Using a shadow buffer for the console solves this problem. The console
emulation will pin the buffer object only during updates from the shadow
buffer. Otherwise, the bochs driver can freely relocated the buffer
between system memory and video ram.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/bochs/bochs_kms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bochs/bochs_kms.c 
b/drivers/gpu/drm/bochs/bochs_kms.c
index bc19dbd531ef..47fab4852483 100644
--- a/drivers/gpu/drm/bochs/bochs_kms.c
+++ b/drivers/gpu/drm/bochs/bochs_kms.c
@@ -172,7 +172,7 @@ bochs_gem_fb_create(struct drm_device *dev, struct drm_file 
*file,
mode_cmd->pixel_format != DRM_FORMAT_BGRX)
return ERR_PTR(-EINVAL);
 
-   return drm_gem_fb_create(dev, file, mode_cmd);
+   return drm_gem_fb_create_with_dirty(dev, file, mode_cmd);
 }
 
 const struct drm_mode_config_funcs bochs_mode_funcs = {
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/5] Unmappable DRM client buffers for fbdev emulation

2019-07-03 Thread Thomas Zimmermann
DRM client buffers are permanently mapped throughout their lifetime. This
prevents us from using generic framebuffer emulation for devices with
small dedicated video memory, such as ast or mgag200. With fb buffers
permanently mapped, such devices often won't have enougth space left to
display other content (e.g., X11).

This patch set introduces unmappable DRM client buffers for framebuffer
emulation with shadow buffers. While the shadow buffer remains in system
memory permanently, the respective buffer object will only be mapped briefly
during updates from the shadow buffer. Hence, the driver can relocate he
buffer object among memory regions as needed.

The default behoviour for DRM client buffers is still to be permanently
mapped.

The patch set converts ast and mgag200 to generic framebuffer emulation
and removes a large amount of framebuffer code from these drivers. For
bochs, a problem was reported where the driver could not display the console
because it was pinned in system memory. [1] The patch set fixes this bug
by converting bochs to use the shadow fb.

The patch set has been tested on ast and mga200 HW.

[1] https://lists.freedesktop.org/archives/dri-devel/2019-June/224423.html

Thomas Zimmermann (5):
  drm/client: Support unmapping of DRM client buffers
  drm/fb-helper: Unmap BO for shadow-buffered framebuffer console
  drm/ast: Replace struct ast_fbdev with generic framebuffer emulation
  drm/bochs: Use shadow buffer for bochs framebuffer console
  drm/mgag200: Replace struct mga_fbdev with generic framebuffer
emulation

 drivers/gpu/drm/ast/Makefile   |   2 +-
 drivers/gpu/drm/ast/ast_drv.c  |  22 +-
 drivers/gpu/drm/ast/ast_drv.h  |  17 --
 drivers/gpu/drm/ast/ast_fb.c   | 341 -
 drivers/gpu/drm/ast/ast_main.c |  30 ++-
 drivers/gpu/drm/ast/ast_mode.c |  21 --
 drivers/gpu/drm/bochs/bochs_kms.c  |   2 +-
 drivers/gpu/drm/drm_client.c   |  71 -
 drivers/gpu/drm/drm_fb_helper.c|  14 +-
 drivers/gpu/drm/mgag200/Makefile   |   2 +-
 drivers/gpu/drm/mgag200/mgag200_drv.h  |  19 --
 drivers/gpu/drm/mgag200/mgag200_fb.c   | 309 --
 drivers/gpu/drm/mgag200/mgag200_main.c |  61 +++--
 drivers/gpu/drm/mgag200/mgag200_mode.c |  27 --
 include/drm/drm_client.h   |   3 +
 15 files changed, 154 insertions(+), 787 deletions(-)
 delete mode 100644 drivers/gpu/drm/ast/ast_fb.c
 delete mode 100644 drivers/gpu/drm/mgag200/mgag200_fb.c

--
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/5] drm/fb-helper: Unmap BO for shadow-buffered framebuffer console

2019-07-03 Thread Thomas Zimmermann
The shadow-buffered framebuffer console only needs the buffer object
to be mapped during updates. While not being updated from the shadow
buffer, the buffer object can remain unmapped.

An unmapped buffer object can be evicted to system memory and does
not consume video ram until displayed. This allows to use generic fbdev
emulation with drivers for low-memory devices, such as ast and mgag200.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_fb_helper.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 1984e5c54d58..2a6538f229e3 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -403,6 +403,7 @@ static void drm_fb_helper_dirty_work(struct work_struct 
*work)
struct drm_clip_rect *clip = &helper->dirty_clip;
struct drm_clip_rect clip_copy;
unsigned long flags;
+   void *vaddr;
 
spin_lock_irqsave(&helper->dirty_lock, flags);
clip_copy = *clip;
@@ -412,10 +413,18 @@ static void drm_fb_helper_dirty_work(struct work_struct 
*work)
 
/* call dirty callback only when it has been really touched */
if (clip_copy.x1 < clip_copy.x2 && clip_copy.y1 < clip_copy.y2) {
+
/* Generic fbdev uses a shadow buffer */
-   if (helper->buffer)
+   if (helper->buffer) {
+   vaddr = drm_client_buffer_vmap(helper->buffer);
+   if (IS_ERR(vaddr))
+   return;
drm_fb_helper_dirty_blit_real(helper, &clip_copy);
+   }
helper->fb->funcs->dirty(helper->fb, NULL, 0, 0, &clip_copy, 1);
+
+   if (helper->buffer)
+   drm_client_buffer_vunmap(helper->buffer);
}
 }
 
@@ -2231,6 +2240,9 @@ int drm_fb_helper_generic_probe(struct drm_fb_helper 
*fb_helper,
fbi->fbdefio = &drm_fbdev_defio;
 
fb_deferred_io_init(fbi);
+
+   /* unmapped by default */
+   drm_client_buffer_vunmap(fb_helper->buffer);
}
 
return 0;
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 5/5] drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation

2019-07-03 Thread Thomas Zimmermann
This patch replaces mgag200's framebuffer console with DRM's generic
implememtation. All respective code is being removed from the driver.

The console is set up with a shadow buffer. The actual buffer object is
not permanently pinned in video ram, but just another buffer object that
the driver moves in and out of vram as necessary. The driver's function
mga_crtc_do_set_base() used to contain special handling for the framebuffer
console. With the new generic framebuffer, the driver does not need this
code an longer.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/mgag200/Makefile   |   2 +-
 drivers/gpu/drm/mgag200/mgag200_drv.h  |  19 --
 drivers/gpu/drm/mgag200/mgag200_fb.c   | 309 -
 drivers/gpu/drm/mgag200/mgag200_main.c |  61 ++---
 drivers/gpu/drm/mgag200/mgag200_mode.c |  27 ---
 5 files changed, 35 insertions(+), 383 deletions(-)
 delete mode 100644 drivers/gpu/drm/mgag200/mgag200_fb.c

diff --git a/drivers/gpu/drm/mgag200/Makefile b/drivers/gpu/drm/mgag200/Makefile
index 98d204408bd0..04b281bcf655 100644
--- a/drivers/gpu/drm/mgag200/Makefile
+++ b/drivers/gpu/drm/mgag200/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 mgag200-y   := mgag200_main.o mgag200_mode.o mgag200_cursor.o \
-   mgag200_drv.o mgag200_fb.o mgag200_i2c.o mgag200_ttm.o
+   mgag200_drv.o mgag200_i2c.o mgag200_ttm.o
 
 obj-$(CONFIG_DRM_MGAG200) += mgag200.o
diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.h 
b/drivers/gpu/drm/mgag200/mgag200_drv.h
index 3ab27f1053c1..1c93f8dc08c7 100644
--- a/drivers/gpu/drm/mgag200/mgag200_drv.h
+++ b/drivers/gpu/drm/mgag200/mgag200_drv.h
@@ -99,14 +99,6 @@
 #define to_mga_encoder(x) container_of(x, struct mga_encoder, base)
 #define to_mga_connector(x) container_of(x, struct mga_connector, base)
 
-struct mga_fbdev {
-   struct drm_fb_helper helper; /* must be first */
-   void *sysram;
-   int size;
-   int x1, y1, x2, y2; /* dirty rect */
-   spinlock_t dirty_lock;
-};
-
 struct mga_crtc {
struct drm_crtc base;
u8 lut_r[256], lut_g[256], lut_b[256];
@@ -180,7 +172,6 @@ struct mga_device {
struct mga_mc   mc;
struct mga_mode_infomode_info;
 
-   struct mga_fbdev *mfbdev;
struct mga_cursor cursor;
 
boolsuspended;
@@ -201,19 +192,9 @@ struct mga_device {
 int mgag200_modeset_init(struct mga_device *mdev);
 void mgag200_modeset_fini(struct mga_device *mdev);
 
-   /* mgag200_fb.c */
-int mgag200_fbdev_init(struct mga_device *mdev);
-void mgag200_fbdev_fini(struct mga_device *mdev);
-
/* mgag200_main.c */
 int mgag200_driver_load(struct drm_device *dev, unsigned long flags);
 void mgag200_driver_unload(struct drm_device *dev);
-int mgag200_gem_create(struct drm_device *dev,
-  u32 size, bool iskernel,
-  struct drm_gem_object **obj);
-int mgag200_dumb_create(struct drm_file *file,
-   struct drm_device *dev,
-   struct drm_mode_create_dumb *args);
 
/* mgag200_i2c.c */
 struct mga_i2c_chan *mgag200_i2c_create(struct drm_device *dev);
diff --git a/drivers/gpu/drm/mgag200/mgag200_fb.c 
b/drivers/gpu/drm/mgag200/mgag200_fb.c
deleted file mode 100644
index c77cf1b34c98..
--- a/drivers/gpu/drm/mgag200/mgag200_fb.c
+++ /dev/null
@@ -1,309 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright 2010 Matt Turner.
- * Copyright 2012 Red Hat
- *
- * Authors: Matthew Garrett
- *  Matt Turner
- *  Dave Airlie
- */
-
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "mgag200_drv.h"
-
-static void mga_dirty_update(struct mga_fbdev *mfbdev,
-int x, int y, int width, int height)
-{
-   int i;
-   struct drm_gem_vram_object *gbo;
-   int src_offset, dst_offset;
-   int ret;
-   u8 *dst;
-   bool unmap = false;
-   bool store_for_later = false;
-   int x2, y2;
-   unsigned long flags;
-   struct drm_framebuffer *fb = mfbdev->helper.fb;
-   int bpp = fb->format->cpp[0];
-
-   gbo = drm_gem_vram_of_gem(fb->obj[0]);
-
-   if (drm_can_sleep()) {
-   /* We pin the BO so it won't be moved during the
-* update. The actual location, video RAM or system
-* memory, is not important.
-*/
-   ret = drm_gem_vram_pin(gbo, 0);
-   if (ret) {
-   if (ret != -EBUSY)
-   return;
-   store_for_later = true;
-   }
-   } else {
-   store_for_later = true;
-   }
-
-   x2 = x + width - 1;
-   y2 = y + height - 1;
-   spin_lock_irqsave(&mfbdev->dirty_lock, flags);
-
-   if (mfbdev->y1 < y)
-   y = mfbdev->y1;
-   i

Re: [PATCH v2 0/3] vsock/virtio: several fixes in the .probe() and .remove()

2019-07-03 Thread Stefan Hajnoczi
On Mon, Jul 01, 2019 at 07:03:57PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 01, 2019 at 04:11:13PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Jun 28, 2019 at 02:36:56PM +0200, Stefano Garzarella wrote:
> > > During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
> > > before registering the driver", Stefan pointed out some possible issues
> > > in the .probe() and .remove() callbacks of the virtio-vsock driver.
> > > 
> > > This series tries to solve these issues:
> > > - Patch 1 adds RCU critical sections to avoid use-after-free of
> > >   'the_virtio_vsock' pointer.
> > > - Patch 2 stops workers before to call vdev->config->reset(vdev) to
> > >   be sure that no one is accessing the device.
> > > - Patch 3 moves the works flush at the end of the .remove() to avoid
> > >   use-after-free of 'vsock' object.
> > > 
> > > v2:
> > > - Patch 1: use RCU to protect 'the_virtio_vsock' pointer
> > > - Patch 2: no changes
> > > - Patch 3: flush works only at the end of .remove()
> > > - Removed patch 4 because virtqueue_detach_unused_buf() returns all the 
> > > buffers
> > >   allocated.
> > > 
> > > v1: https://patchwork.kernel.org/cover/10964733/
> > 
> > This looks good to me.
> 
> Thanks for the review!
> 
> > 
> > Did you run any stress tests?  For example an SMP guest constantly
> > connecting and sending packets together with a script that
> > hotplug/unplugs vhost-vsock-pci from the host side.
> 
> Yes, I started an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait)
> and I run these scripts to stress the .probe()/.remove() path:
> 
> - guest
>   while true; do
>   cat /dev/urandom | nc-vsock -l 4321 > /dev/null &
>   cat /dev/urandom | nc-vsock -l 5321 > /dev/null &
>   cat /dev/urandom | nc-vsock -l 6321 > /dev/null &
>   cat /dev/urandom | nc-vsock -l 7321 > /dev/null &
>   wait
>   done
> 
> - host
>   while true; do
>   cat /dev/urandom | nc-vsock 3 4321 > /dev/null &
>   cat /dev/urandom | nc-vsock 3 5321 > /dev/null &
>   cat /dev/urandom | nc-vsock 3 6321 > /dev/null &
>   cat /dev/urandom | nc-vsock 3 7321 > /dev/null &
>   sleep 2
>   echo "device_del v1" | nc 127.0.0.1 1234
>   sleep 1
>   echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 1234
>   sleep 1
>   done
> 
> Do you think is enough or is better to have a test more accurate?

That's good when left running overnight so that thousands of hotplug
events are tested.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
 #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Each message will be written to or read from this region at offset 0:

int vhost_vfio_write(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
int ret;

ret = pwrite64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count)
return -1;

return 0;
}

int vhost_vfio_read(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
uint64_t request = op->request;
int ret;

ret = pread64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count || request != op->request)
return -1;

return 0;
}

It's quite straightforward to set things to the device. Just need to
write the message to device directly:

int vhost_vfio_set_features(struct vhost_dev *dev, uint64_t features)
{
struct vhost_vfio_op op;

op.request = VHOST_SET_FEATURES;
op.flags = 0;
op.size = sizeof(features);
op.payload.u64 = features;

return vhost_vfio_write(dev, &op);
}

To get things from the device, two steps are needed.
Take VHOST_GET_FEATURE as an example:

int vhost_vfio_get_features(struct vhost_dev *dev, uint64_t *features)
{
struct vhost_vfio_op op;
int ret;

op.request = VHOST_GET_FEATURES;
op.flags = VHOST_VFIO_NEED_REPLY;
op.size = 0;

/* Just need to write the header */
ret = vhost_vfio_write(dev, &op);
if (ret != 0)
goto out;

/* `op` wasn't changed during write */
op.flags = 0;
op.size = sizeof(*features);

ret = vhost_vfio_read(dev, &op);
if (ret != 0)
goto out;

*features = op.payload.u64;
out:
return ret;
}

2. NOTIFIY_REGION (mmap-able)

The region described by NOTIFY_REGION will be used to notify
the device.

Each queue will have a page for notification, and it can be mapped
to VM (if hardware also supports), and the virtio driver in the VM
will be able to notify the device directly.

The region described by NOTIFY_REGION is also write-able. If
the accelerator's notification register(s) cannot be mapped to
the VM, write() can also be used to notify the device. Something
like this:

void notify_relay(void *opaque)
{
..
offset = host_page_size * queue_idx;

ret = pwrite64(vfio->device_fd, &queue_idx, sizeof(queue_idx),
vfio->notify_offset + offset);
..
}

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.

Signed-off-by: Tiwei Bie 
---
 drivers/vhost/Makefile |   2 +
 drivers/vhost/vdpa.c   | 770 +
 include/linux/vdpa_mdev.h  |  72 
 include/uapi/linux/vfio.h  |  19 +
 include/uapi/linux/vhost.h |  25 ++
 5 files changed, 888 insertions(+)
 create mode 100644 drivers/vhost/vdpa.c
 create mode 100644 include/linux/vdpa_mdev.h

diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..cabb71095940 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -10,4 +10,6 @@ vhost_vsock-y := vsock.o
 
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
+obj-$(CONFIG_VHOST_VFIO) += vdpa.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
new file mode 100644
index ..5c9426e2a091
--- /dev/null
+++ b/drivers/vhost/vdpa.c
@

Re: [PATCH v2 1/3] vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock

2019-07-03 Thread Jason Wang


On 2019/6/28 下午8:36, Stefano Garzarella wrote:

Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.

To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.

We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.

We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.

Signed-off-by: Stefano Garzarella 
---
  net/vmw_vsock/virtio_transport.c | 67 +---
  1 file changed, 44 insertions(+), 23 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 9c287e3e393c..7ad510ec12e0 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -65,19 +65,22 @@ struct virtio_vsock {
u32 guest_cid;
  };
  
-static struct virtio_vsock *virtio_vsock_get(void)

-{
-   return the_virtio_vsock;
-}
-
  static u32 virtio_transport_get_local_cid(void)
  {
-   struct virtio_vsock *vsock = virtio_vsock_get();
+   struct virtio_vsock *vsock;
+   u32 ret;
  
-	if (!vsock)

-   return VMADDR_CID_ANY;
+   rcu_read_lock();
+   vsock = rcu_dereference(the_virtio_vsock);
+   if (!vsock) {
+   ret = VMADDR_CID_ANY;
+   goto out_rcu;
+   }
  
-	return vsock->guest_cid;

+   ret = vsock->guest_cid;
+out_rcu:
+   rcu_read_unlock();
+   return ret;
  }
  
  static void virtio_transport_loopback_work(struct work_struct *work)

@@ -197,14 +200,18 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
struct virtio_vsock *vsock;
int len = pkt->len;
  
-	vsock = virtio_vsock_get();

+   rcu_read_lock();
+   vsock = rcu_dereference(the_virtio_vsock);
if (!vsock) {
virtio_transport_free_pkt(pkt);
-   return -ENODEV;
+   len = -ENODEV;
+   goto out_rcu;
}
  
-	if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid)

-   return virtio_transport_send_pkt_loopback(vsock, pkt);
+   if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
+   len = virtio_transport_send_pkt_loopback(vsock, pkt);
+   goto out_rcu;
+   }
  
  	if (pkt->reply)

atomic_inc(&vsock->queued_replies);
@@ -214,6 +221,9 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
spin_unlock_bh(&vsock->send_pkt_list_lock);
  
  	queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);

+
+out_rcu:
+   rcu_read_unlock();
return len;
  }
  
@@ -222,12 +232,14 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)

  {
struct virtio_vsock *vsock;
struct virtio_vsock_pkt *pkt, *n;
-   int cnt = 0;
+   int cnt = 0, ret;
LIST_HEAD(freeme);
  
-	vsock = virtio_vsock_get();

+   rcu_read_lock();
+   vsock = rcu_dereference(the_virtio_vsock);
if (!vsock) {
-   return -ENODEV;
+   ret = -ENODEV;
+   goto out_rcu;
}
  
  	spin_lock_bh(&vsock->send_pkt_list_lock);

@@ -255,7 +267,11 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
queue_work(virtio_vsock_workqueue, &vsock->rx_work);
}
  
-	return 0;

+   ret = 0;
+
+out_rcu:
+   rcu_read_unlock();
+   return ret;
  }
  
  static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)

@@ -590,8 +606,6 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
vsock->rx_buf_max_nr = 0;
atomic_set(&vsock->queued_replies, 0);
  
-	vdev->priv = vsock;

-   the_virtio_vsock = vsock;
mutex_init(&vsock->tx_lock);
mutex_init(&vsock->rx_lock);
mutex_init(&vsock->event_lock);
@@ -613,6 +627,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
virtio_vsock_event_fill(vsock);
mutex_unlock(&vsock->event_lock);
  
+	vdev->priv = vsock;

+   rcu_assign_pointer(the_virtio_vsock, vsock);



You probably need to use rcu_dereference_protected() to access 
the_virtio_vsock in the function in order to survive from sparse.




+
mutex_unlock(&the_virtio_vsock_mutex);
return 0;
  
@@ -627,6 +644,12 @@ static void virtio_vsock_remove(struct virtio_device *vdev)

struct virtio_vsock *vsock = vdev->priv;
struct virtio_vsock_pkt *pkt;
  
+	mutex_lock(&the_virtio_vsock_mutex);

+
+   vdev->priv = NULL;
+   rcu_assign_pointer(the_virtio_vsock, NULL);



This is still suspici

Re: [PATCH v2 0/3] vsock/virtio: several fixes in the .probe() and .remove()

2019-07-03 Thread Stefano Garzarella
On Wed, Jul 03, 2019 at 10:14:53AM +0100, Stefan Hajnoczi wrote:
> On Mon, Jul 01, 2019 at 07:03:57PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 01, 2019 at 04:11:13PM +0100, Stefan Hajnoczi wrote:
> > > On Fri, Jun 28, 2019 at 02:36:56PM +0200, Stefano Garzarella wrote:
> > > > During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
> > > > before registering the driver", Stefan pointed out some possible issues
> > > > in the .probe() and .remove() callbacks of the virtio-vsock driver.
> > > > 
> > > > This series tries to solve these issues:
> > > > - Patch 1 adds RCU critical sections to avoid use-after-free of
> > > >   'the_virtio_vsock' pointer.
> > > > - Patch 2 stops workers before to call vdev->config->reset(vdev) to
> > > >   be sure that no one is accessing the device.
> > > > - Patch 3 moves the works flush at the end of the .remove() to avoid
> > > >   use-after-free of 'vsock' object.
> > > > 
> > > > v2:
> > > > - Patch 1: use RCU to protect 'the_virtio_vsock' pointer
> > > > - Patch 2: no changes
> > > > - Patch 3: flush works only at the end of .remove()
> > > > - Removed patch 4 because virtqueue_detach_unused_buf() returns all the 
> > > > buffers
> > > >   allocated.
> > > > 
> > > > v1: https://patchwork.kernel.org/cover/10964733/
> > > 
> > > This looks good to me.
> > 
> > Thanks for the review!
> > 
> > > 
> > > Did you run any stress tests?  For example an SMP guest constantly
> > > connecting and sending packets together with a script that
> > > hotplug/unplugs vhost-vsock-pci from the host side.
> > 
> > Yes, I started an SMP guest (-smp 4 -monitor 
> > tcp:127.0.0.1:1234,server,nowait)
> > and I run these scripts to stress the .probe()/.remove() path:
> > 
> > - guest
> >   while true; do
> >   cat /dev/urandom | nc-vsock -l 4321 > /dev/null &
> >   cat /dev/urandom | nc-vsock -l 5321 > /dev/null &
> >   cat /dev/urandom | nc-vsock -l 6321 > /dev/null &
> >   cat /dev/urandom | nc-vsock -l 7321 > /dev/null &
> >   wait
> >   done
> > 
> > - host
> >   while true; do
> >   cat /dev/urandom | nc-vsock 3 4321 > /dev/null &
> >   cat /dev/urandom | nc-vsock 3 5321 > /dev/null &
> >   cat /dev/urandom | nc-vsock 3 6321 > /dev/null &
> >   cat /dev/urandom | nc-vsock 3 7321 > /dev/null &
> >   sleep 2
> >   echo "device_del v1" | nc 127.0.0.1 1234
> >   sleep 1
> >   echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 
> > 1234
> >   sleep 1
> >   done
> > 
> > Do you think is enough or is better to have a test more accurate?
> 
> That's good when left running overnight so that thousands of hotplug
> events are tested.

Honestly I run the test for ~30 mins (because without the patch the
crash happens in a few seconds), but of course, I'll run it this night :)

Thanks,
Stefano
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Jason Wang


On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
  #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.



Still a comments like V1. What's the advantage of inventing a new 
protocol? I believe either of the following should be better:


- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL 
and extend it with e.g notify region. The advantages is that all exist 
userspace program could be reused without modification (or minimal 
modification). And vhost API hides lots of details that is not necessary 
to be understood by application (e.g in the case of container).


- using PCI layout, then you don't even need to re-invent notifiy region 
at all and we can pass-through them to guest.


Personally, I prefer vhost ioctl.




Each message will be written to or read from this region at offset 0:

int vhost_vfio_write(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
int ret;

ret = pwrite64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count)
return -1;

return 0;
}

int vhost_vfio_read(struct vhost_dev *dev, struct vhost_vfio_op *op)
{
int count = VHOST_VFIO_OP_HDR_SIZE + op->size;
struct vhost_vfio *vfio = dev->opaque;
uint64_t request = op->request;
int ret;

ret = pread64(vfio->device_fd, op, count, vfio->config_offset);
if (ret != count || request != op->request)
return -1;

return 0;
}

It's quite straightforward to set things to the device. Just need to
write the message to device directly:

int vhost_vfio_set_features(struct vhost_dev *dev, uint64_t features)
{
struct vhost_vfio_op op;

op.request = VHOST_SET_FEATURES;
op.flags = 0;
op.size = sizeof(features);
op.payload.u64 = features;

return vhost_vfio_write(dev, &op);
}

To get things from the device, two steps are needed.
Take VHOST_GET_FEATURE as an example:

int vhost_vfio_get_features(struct vhost_dev *dev, uint64_t *features)
{
struct vhost_vfio_op op;
int ret;

op.request = VHOST_GET_FEATURES;
op.flags = VHOST_VFIO_NEED_REPLY;
op.size = 0;

/* Just need to write the header */
ret = vhost_vfio_write(dev, &op);
if (ret != 0)
goto out;

/* `op` wasn't changed during write */
op.flags = 0;
op.size = sizeof(*features);

ret = vhost_vfio_read(dev, &op);
if (ret != 0)
goto out;

*features = op.payload.u64;
out:
return ret;
}

2. NOTIFIY_REGION (mmap-able)

The region described by NOTIFY_REGION will be used to notify
the device.

Each queue will have a page for notification, and it can be mapped
to VM (if hardware also supports), and the virtio driver in the VM
will be able to notify the device directly.

The region described by NOTIFY_REGION is also write-able. If
the accelerator's notification register(s) cannot be mapped to
the VM, write() can also be used to notify the device. Something
like this:

void notify_relay(void *opaque)
{
..
offset = host_page_size * queue_idx;

ret = pwrite64(vfio->device_fd, &queue_idx, sizeof(queue_idx),
vfio->notify_offset + offset);
..
}

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.



How about DMA API? Do you expect to use VFIO IOMMU API or using vhost 
SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with 
SET_MEM_TABLE DMA can

Re: [PATCH v2 1/3] vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock

2019-07-03 Thread Stefano Garzarella
On Wed, Jul 03, 2019 at 05:53:58PM +0800, Jason Wang wrote:
> 
> On 2019/6/28 下午8:36, Stefano Garzarella wrote:
> > Some callbacks used by the upper layers can run while we are in the
> > .remove(). A potential use-after-free can happen, because we free
> > the_virtio_vsock without knowing if the callbacks are over or not.
> > 
> > To solve this issue we move the assignment of the_virtio_vsock at the
> > end of .probe(), when we finished all the initialization, and at the
> > beginning of .remove(), before to release resources.
> > For the same reason, we do the same also for the vdev->priv.
> > 
> > We use RCU to be sure that all callbacks that use the_virtio_vsock
> > ended before freeing it. This is not required for callbacks that
> > use vdev->priv, because after the vdev->config->del_vqs() we are sure
> > that they are ended and will no longer be invoked.
> > 
> > We also take the mutex during the .remove() to avoid that .probe() can
> > run while we are resetting the device.
> > 
> > Signed-off-by: Stefano Garzarella 
> > ---
> >   net/vmw_vsock/virtio_transport.c | 67 +---
> >   1 file changed, 44 insertions(+), 23 deletions(-)
> > 
> > diff --git a/net/vmw_vsock/virtio_transport.c 
> > b/net/vmw_vsock/virtio_transport.c
> > index 9c287e3e393c..7ad510ec12e0 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -65,19 +65,22 @@ struct virtio_vsock {
> > u32 guest_cid;
> >   };
> > -static struct virtio_vsock *virtio_vsock_get(void)
> > -{
> > -   return the_virtio_vsock;
> > -}
> > -
> >   static u32 virtio_transport_get_local_cid(void)
> >   {
> > -   struct virtio_vsock *vsock = virtio_vsock_get();
> > +   struct virtio_vsock *vsock;
> > +   u32 ret;
> > -   if (!vsock)
> > -   return VMADDR_CID_ANY;
> > +   rcu_read_lock();
> > +   vsock = rcu_dereference(the_virtio_vsock);
> > +   if (!vsock) {
> > +   ret = VMADDR_CID_ANY;
> > +   goto out_rcu;
> > +   }
> > -   return vsock->guest_cid;
> > +   ret = vsock->guest_cid;
> > +out_rcu:
> > +   rcu_read_unlock();
> > +   return ret;
> >   }
> >   static void virtio_transport_loopback_work(struct work_struct *work)
> > @@ -197,14 +200,18 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt 
> > *pkt)
> > struct virtio_vsock *vsock;
> > int len = pkt->len;
> > -   vsock = virtio_vsock_get();
> > +   rcu_read_lock();
> > +   vsock = rcu_dereference(the_virtio_vsock);
> > if (!vsock) {
> > virtio_transport_free_pkt(pkt);
> > -   return -ENODEV;
> > +   len = -ENODEV;
> > +   goto out_rcu;
> > }
> > -   if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid)
> > -   return virtio_transport_send_pkt_loopback(vsock, pkt);
> > +   if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
> > +   len = virtio_transport_send_pkt_loopback(vsock, pkt);
> > +   goto out_rcu;
> > +   }
> > if (pkt->reply)
> > atomic_inc(&vsock->queued_replies);
> > @@ -214,6 +221,9 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> > spin_unlock_bh(&vsock->send_pkt_list_lock);
> > queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
> > +
> > +out_rcu:
> > +   rcu_read_unlock();
> > return len;
> >   }
> > @@ -222,12 +232,14 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> >   {
> > struct virtio_vsock *vsock;
> > struct virtio_vsock_pkt *pkt, *n;
> > -   int cnt = 0;
> > +   int cnt = 0, ret;
> > LIST_HEAD(freeme);
> > -   vsock = virtio_vsock_get();
> > +   rcu_read_lock();
> > +   vsock = rcu_dereference(the_virtio_vsock);
> > if (!vsock) {
> > -   return -ENODEV;
> > +   ret = -ENODEV;
> > +   goto out_rcu;
> > }
> > spin_lock_bh(&vsock->send_pkt_list_lock);
> > @@ -255,7 +267,11 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
> > queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> > }
> > -   return 0;
> > +   ret = 0;
> > +
> > +out_rcu:
> > +   rcu_read_unlock();
> > +   return ret;
> >   }
> >   static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> > @@ -590,8 +606,6 @@ static int virtio_vsock_probe(struct virtio_device 
> > *vdev)
> > vsock->rx_buf_max_nr = 0;
> > atomic_set(&vsock->queued_replies, 0);
> > -   vdev->priv = vsock;
> > -   the_virtio_vsock = vsock;
> > mutex_init(&vsock->tx_lock);
> > mutex_init(&vsock->rx_lock);
> > mutex_init(&vsock->event_lock);
> > @@ -613,6 +627,9 @@ static int virtio_vsock_probe(struct virtio_device 
> > *vdev)
> > virtio_vsock_event_fill(vsock);
> > mutex_unlock(&vsock->event_lock);
> > +   vdev->priv = vsock;
> > +   rcu_assign_pointer(the_virtio_vsock, vsock);
> 
> 
> You probably need to use rcu_dereference_protected() to access
> the_virtio_vsock in the function in order to survive from sparse.
> 

Ooo, thanks!

Do you mean when we check if the_virtio_vsock is not null at the beginning of

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > Details about this can be found here:
> > 
> > https://lwn.net/Articles/750770/
> > 
> > What's new in this version
> > ==
> > 
> > A new VFIO device type is introduced - vfio-vhost. This addressed
> > some comments from here: https://patchwork.ozlabs.org/cover/984763/
> > 
> > Below is the updated device interface:
> > 
> > Currently, there are two regions of this device: 1) CONFIG_REGION
> > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > can be used to notify the device.
> > 
> > 1. CONFIG_REGION
> > 
> > The region described by CONFIG_REGION is the main control interface.
> > Messages will be written to or read from this region.
> > 
> > The message type is determined by the `request` field in message
> > header. The message size is encoded in the message header too.
> > The message format looks like this:
> > 
> > struct vhost_vfio_op {
> > __u64 request;
> > __u32 flags;
> > /* Flag values: */
> >   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > __u32 size;
> > union {
> > __u64 u64;
> > struct vhost_vring_state state;
> > struct vhost_vring_addr addr;
> > } payload;
> > };
> > 
> > The existing vhost-kernel ioctl cmds are reused as the message
> > requests in above structure.
> 
> 
> Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..

> I believe either of the following should be better:
> 
> - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
> extend it with e.g notify region. The advantages is that all exist userspace
> program could be reused without modification (or minimal modification). And
> vhost API hides lots of details that is not necessary to be understood by
> application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

> 
> - using PCI layout, then you don't even need to re-invent notifiy region at
> all and we can pass-through them to guest.

Like what you said previously, virtio has transports other than PCI.
And it will look a bit odd when using transports other than PCI..

> 
> Personally, I prefer vhost ioctl.

+1

> 
> 
> > 
[...]
> > 
> > 3. VFIO interrupt ioctl API
> > 
> > VFIO interrupt ioctl API is used to setup device interrupts.
> > IRQ-bypass can also be supported.
> > 
> > Currently, the data path interrupt can be configured via the
> > VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
> 
> 
> How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
> SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
> SET_MEM_TABLE DMA can be done at the level of parent device which means it
> can work for e.g the card with on-chip IOMMU.

Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
to do the DMA programming. But like what you said, there could be
a problem when using cards with on-chip IOMMU.

> 
> And what's the plan for vIOMMU?

As this RFC assumes userspace will use VFIO IOMMU API, userspace
just needs to follow the same way like what vfio-pci device does
in QEMU to support vIOMMU.

> 
> 
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> >   drivers/vhost/Makefile |   2 +
> >   drivers/vhost/vdpa.c   | 770 +
> >   include/linux/vdpa_mdev.h  |  72 
> >   include/uapi/linux/vfio.h  |  19 +
> >   include/uapi/linux/vhost.h |  25 ++
> >   5 files changed, 888 insertions(+)
> >   create mode 100644 drivers/vhost/vdpa.c
> >   create mode 100644 include/linux/vdpa_mdev.h
> 
> 
> We probably need some sample parent device implementation. It could be a
> software datapath like e.g we can start from virtio-net device in guest or a
> vhost/tap on host.

Yeah, something like this would be interesting!

Thanks,
Tiwei

> 
> Thanks
> 
> 
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Jason Wang


On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here: https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
   #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.


Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..


I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?



Can we simply add them into ioctl of mdev_parent_ops?





- using PCI layout, then you don't even need to re-invent notifiy region at
all and we can pass-through them to guest.

Like what you said previously, virtio has transports other than PCI.
And it will look a bit odd when using transports other than PCI..



Yes.





Personally, I prefer vhost ioctl.

+1




[...]

3. VFIO interrupt ioctl API

VFIO interrupt ioctl API is used to setup device interrupts.
IRQ-bypass can also be supported.

Currently, the data path interrupt can be configured via the
VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.


How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
SET_MEM_TABLE DMA can be done at the level of parent device which means it
can work for e.g the card with on-chip IOMMU.

Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
to do the DMA programming. But like what you said, there could be
a problem when using cards with on-chip IOMMU.



Yes, another issue is SET_MEM_TABLE can not be used to update just a 
part of the table. This seems less flexible than VFIO API but it could 
be extended.






And what's the plan for vIOMMU?

As this RFC assumes userspace will use VFIO IOMMU API, userspace
just needs to follow the same way like what vfio-pci device does
in QEMU to support vIOMMU.



Right, this is more a question for the qemu part. It means it needs to 
go for ordinary VFIO path to get all notifiers/listeners support from 
vIOMMU.








Signed-off-by: Tiwei Bie 
---
   drivers/vhost/Makefile |   2 +
   drivers/vhost/vdpa.c   | 770 +
   include/linux/vdpa_mdev.h  |  72 
   include/uapi/linux/vfio.h  |  19 +
   include/uapi/linux/vhost.h |  25 ++
   5 files changed, 888 insertions(+)
   create mode 100644 drivers/vhost/vdpa.c
   create mode 100644 include/linux/vdpa_mdev.h


We probably need some sample parent device implementation. It could be a
software datapath like e.g we can start from virtio-net device in guest or a
vhost/tap on host.

Yeah, something like this would be interesting!



Plan to do something like that :) ?

Thanks




Thanks,
Tiwei


Thanks



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > Details about this can be found here:
> > > > 
> > > > https://lwn.net/Articles/750770/
> > > > 
> > > > What's new in this version
> > > > ==
> > > > 
> > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > some comments from here: https://patchwork.ozlabs.org/cover/984763/
> > > > 
> > > > Below is the updated device interface:
> > > > 
> > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > can be used to notify the device.
> > > > 
> > > > 1. CONFIG_REGION
> > > > 
> > > > The region described by CONFIG_REGION is the main control interface.
> > > > Messages will be written to or read from this region.
> > > > 
> > > > The message type is determined by the `request` field in message
> > > > header. The message size is encoded in the message header too.
> > > > The message format looks like this:
> > > > 
> > > > struct vhost_vfio_op {
> > > > __u64 request;
> > > > __u32 flags;
> > > > /* Flag values: */
> > > >#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > __u32 size;
> > > > union {
> > > > __u64 u64;
> > > > struct vhost_vring_state state;
> > > > struct vhost_vring_addr addr;
> > > > } payload;
> > > > };
> > > > 
> > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > requests in above structure.
> > > 
> > > Still a comments like V1. What's the advantage of inventing a new 
> > > protocol?
> > I'm trying to make it work in VFIO's way..
> > 
> > > I believe either of the following should be better:
> > > 
> > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
> > > extend it with e.g notify region. The advantages is that all exist 
> > > userspace
> > > program could be reused without modification (or minimal modification). 
> > > And
> > > vhost API hides lots of details that is not necessary to be understood by
> > > application (e.g in the case of container).
> > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > or introducing another mdev driver (i.e. vhost_mdev instead of
> > using the existing vfio_mdev) for mdev device?
> 
> 
> Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?

> 
> 
> > 
[...]
> > > > 3. VFIO interrupt ioctl API
> > > > 
> > > > VFIO interrupt ioctl API is used to setup device interrupts.
> > > > IRQ-bypass can also be supported.
> > > > 
> > > > Currently, the data path interrupt can be configured via the
> > > > VFIO_VHOST_VQ_IRQ_INDEX with virtqueue's callfd.
> > > 
> > > How about DMA API? Do you expect to use VFIO IOMMU API or using vhost
> > > SET_MEM_TABLE? VFIO IOMMU API is more generic for sure but with
> > > SET_MEM_TABLE DMA can be done at the level of parent device which means it
> > > can work for e.g the card with on-chip IOMMU.
> > Agree. In this RFC, it assumes userspace will use VFIO IOMMU API
> > to do the DMA programming. But like what you said, there could be
> > a problem when using cards with on-chip IOMMU.
> 
> 
> Yes, another issue is SET_MEM_TABLE can not be used to update just a part of
> the table. This seems less flexible than VFIO API but it could be extended.

Agree.

> 
> 
> > 
> > > And what's the plan for vIOMMU?
> > As this RFC assumes userspace will use VFIO IOMMU API, userspace
> > just needs to follow the same way like what vfio-pci device does
> > in QEMU to support vIOMMU.
> 
> 
> Right, this is more a question for the qemu part. It means it needs to go
> for ordinary VFIO path to get all notifiers/listeners support from vIOMMU.

Yeah.

> 
> 
> > 
> > > 
> > > > Signed-off-by: Tiwei Bie 
> > > > ---
> > > >drivers/vhost/Makefile |   2 +
> > > >drivers/vhost/vdpa.c   | 770 
> > > > +
> > > >include/linux/vdpa_mdev.h  |  72 
> > > >include/uapi/linux/vfio.h  |  19 +
> > > >include/uapi/linux/vhost.h |  25 ++
> > > >5 files changed, 888 insertions(+)
> > > >create mode 100644 drivers/vhost/vdpa.c
> > > >create mode 100644 include/linux/vdpa_mdev.h
> > > 
> > > We probably need some sample parent device implementation. It could be a
> > > software datapath like e.g we can start from virtio-net device in guest 
> > > or a
> > > vhost/

Re: [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

2019-07-03 Thread Juergen Gross

On 03.07.19 01:51, Nadav Amit wrote:

To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. Introduce
paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
and hyper-v are only compile-tested).

While the updated smp infrastructure is capable of running a function on
a single local core, it is not optimized for this case. The multiple
function calls and the indirect branch introduce some overhead, and
might make local TLB flushes slower than they were before the recent
changes.

Before calling the SMP infrastructure, check if only a local TLB flush
is needed to restore the lost performance in this common case. This
requires to check mm_cpumask() one more time, but unless this mask is
updated very frequently, this should impact performance negatively.

Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Sasha Levin 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: Juergen Gross 
Cc: Paolo Bonzini 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Boris Ostrovsky 
Cc: linux-hyp...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: k...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org
Signed-off-by: Nadav Amit 
---
  arch/x86/hyperv/mmu.c | 13 +++---
  arch/x86/include/asm/paravirt.h   |  6 +--
  arch/x86/include/asm/paravirt_types.h |  4 +-
  arch/x86/include/asm/tlbflush.h   |  9 ++--
  arch/x86/include/asm/trace/hyperv.h   |  2 +-
  arch/x86/kernel/kvm.c | 11 +++--
  arch/x86/kernel/paravirt.c|  2 +-
  arch/x86/mm/tlb.c | 65 ---
  arch/x86/xen/mmu_pv.c | 20 ++---
  include/trace/events/xen.h|  2 +-
  10 files changed, 91 insertions(+), 43 deletions(-)


...


diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index beb44e22afdf..19e481e6e904 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long addr)
preempt_enable();
  }
  
-static void xen_flush_tlb_others(const struct cpumask *cpus,

-const struct flush_tlb_info *info)
+static void xen_flush_tlb_multi(const struct cpumask *cpus,
+   const struct flush_tlb_info *info)
  {
struct {
struct mmuext_op op;
@@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct cpumask 
*cpus,
const size_t mc_entry_size = sizeof(args->op) +
sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());
  
-	trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);

+   trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);
  
  	if (cpumask_empty(cpus))

return; /* nothing to do */
@@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct cpumask 
*cpus,
args = mcs.args;
args->op.arg2.vcpumask = to_cpumask(args->mask);
  
-	/* Remove us, and any offline CPUS. */

+   /* Flush locally if needed and remove us */
+   if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
+   local_irq_disable();
+   flush_tlb_func_local(info);


I think this isn't the correct function for PV guests.

In fact it should be much easier: just don't clear the own cpu from the
mask, that's all what's needed. The hypervisor is just fine having the
current cpu in the mask and it will do the right thing.


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] drm/client: Support unmapping of DRM client buffers

2019-07-03 Thread Noralf Trønnes



Den 03.07.2019 10.32, skrev Thomas Zimmermann:
> DRM clients, such as the fbdev emulation, have their buffer objects
> mapped by default. Mapping a buffer implicitly prevents its relocation.
> Hence, the buffer may permanently consume video memory while it's
> allocated. This is a problem for drivers of low-memory devices, such as
> ast, mgag200 or older framebuffer hardware, which will then not have
> enough memory to display other content (e.g., X11).
> 
> This patch introduces drm_client_buffer_vmap() and _vunmap(). Internal
> DRM clients can use these functions to unmap and remap buffer objects
> as needed.
> 
> There's no reference counting for vmap operations. Callers are expected
> to either keep buffers mapped (as it is now), or call vmap and vunmap
> in pairs around code that accesses the mapped memory.
> 
> Signed-off-by: Thomas Zimmermann 
> ---
>  drivers/gpu/drm/drm_client.c | 71 +++-
>  include/drm/drm_client.h |  3 ++
>  2 files changed, 64 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
> index 410572f14257..d04660c4470a 100644
> --- a/drivers/gpu/drm/drm_client.c
> +++ b/drivers/gpu/drm/drm_client.c
> @@ -235,7 +235,8 @@ static void drm_client_buffer_delete(struct 
> drm_client_buffer *buffer)
>  {
>   struct drm_device *dev = buffer->client->dev;
>  
> - drm_gem_vunmap(buffer->gem, buffer->vaddr);
> + if (buffer->vaddr)

No need for this, drm_gem_vunmap() has a NULL check.

> + drm_gem_vunmap(buffer->gem, buffer->vaddr);
>  
>   if (buffer->gem)
>   drm_gem_object_put_unlocked(buffer->gem);
> @@ -281,6 +282,43 @@ drm_client_buffer_create(struct drm_client_dev *client, 
> u32 width, u32 height, u
>  
>   buffer->gem = obj;
>  
> + vaddr = drm_client_buffer_vmap(buffer);

I think we should change this and _not_ vmap on buffer creation.
Eventually we'll get bootsplash and console clients and they will also
have to deal with these low memory devices. AFAICS the only client that
will need a virtual address at all times is the fbdev client when it
doesn't use a shadow buffer.

> + if (IS_ERR(vaddr)) {
> + ret = PTR_ERR(vaddr);
> + goto err_delete;
> + }
> +
> + return buffer;
> +
> +err_delete:
> + drm_client_buffer_delete(buffer);
> +
> + return ERR_PTR(ret);
> +}
> +
> +/**
> + * drm_client_buffer_vmap - Map DRM client buffer into address space
> + * @buffer: DRM client buffer
> + *
> + * This function maps a client buffer into kernel address space. If the
> + * buffer is already mapped, it returns the mapping's address.
> + *
> + * Client buffer mappings are not ref'counted. Each call to
> + * drm_client_buffer_vmap() should be followed by a call to
> + * drm_client_buffer_vunmap(); or the client buffer should be mapped
> + * throughout its lifetime. The latter is the default.
> + *
> + * Returns:
> + *   The mapped memory's address
> + */
> +void *
> +drm_client_buffer_vmap(struct drm_client_buffer *buffer)
> +{
> + void *vaddr;
> +
> + if (buffer->vaddr)
> + return buffer->vaddr;
> +
>   /*
>* FIXME: The dependency on GEM here isn't required, we could
>* convert the driver handle to a dma-buf instead and use the
> @@ -289,21 +327,34 @@ drm_client_buffer_create(struct drm_client_dev *client, 
> u32 width, u32 height, u
>* fd_install step out of the driver backend hooks, to make that
>* final step optional for internal users.
>*/
> - vaddr = drm_gem_vmap(obj);
> - if (IS_ERR(vaddr)) {
> - ret = PTR_ERR(vaddr);
> - goto err_delete;
> - }
> + vaddr = drm_gem_vmap(buffer->gem);
> + if (IS_ERR(vaddr))
> + return vaddr;
>  
>   buffer->vaddr = vaddr;
>  
> - return buffer;
> + return vaddr;
> +}
> +EXPORT_SYMBOL(drm_client_buffer_vmap);
>  
> -err_delete:
> - drm_client_buffer_delete(buffer);
> +/**
> + * drm_client_buffer_vunmap - Unmap DRM client buffer
> + * @buffer: DRM client buffer
> + *
> + * This function removes a client buffer's memory mmapping. This
> + * function is only required by clients that manage their buffers
> + * by themselves. By default, DRM client buffers are mapped throughout
> + * their entire lifetime.
> + */
> +void drm_client_buffer_vunmap(struct drm_client_buffer *buffer)
> +{
> + if (!buffer->vaddr)

No need for a NULL check here either.

Noralf.

> + return;
>  
> - return ERR_PTR(ret);
> + drm_gem_vunmap(buffer->gem, buffer->vaddr);
> + buffer->vaddr = NULL;
>  }
> +EXPORT_SYMBOL(drm_client_buffer_vunmap);
>  
>  static void drm_client_buffer_rmfb(struct drm_client_buffer *buffer)
>  {
> diff --git a/include/drm/drm_client.h b/include/drm/drm_client.h
> index 72d51d1e9dd9..e1db1d9da0bf 100644
> --- a/include/drm/drm_client.h
> +++ b/include/drm/drm_client.h
> @@ -149,6 +149,9 @@ struct drm_client_buffer {
>  struct

Re: [PATCH 3/5] drm/ast: Replace struct ast_fbdev with generic framebuffer emulation

2019-07-03 Thread Noralf Trønnes


Den 03.07.2019 10.33, skrev Thomas Zimmermann:
> This patch replaces ast's framebuffer console with DRM's generic
> implememtation. All respective code is being removed from the driver.
> 
> The console is set up with a shadow buffer. The actual buffer object is
> not permanently pinned in video ram, but just another buffer object that
> the driver moves in and out of vram as necessary. The driver's function
> ast_crtc_do_set_base() used to contain special handling for the framebuffer
> console. With the new generic framebuffer, the driver does not need this
> code an longer.
> 
> Signed-off-by: Thomas Zimmermann 
> ---
>  drivers/gpu/drm/ast/Makefile   |   2 +-
>  drivers/gpu/drm/ast/ast_drv.c  |  22 ++-
>  drivers/gpu/drm/ast/ast_drv.h  |  17 --
>  drivers/gpu/drm/ast/ast_fb.c   | 341 -
>  drivers/gpu/drm/ast/ast_main.c |  30 ++-
>  drivers/gpu/drm/ast/ast_mode.c |  21 --
>  6 files changed, 41 insertions(+), 392 deletions(-)
>  delete mode 100644 drivers/gpu/drm/ast/ast_fb.c
> 
> diff --git a/drivers/gpu/drm/ast/Makefile b/drivers/gpu/drm/ast/Makefile
> index b086dae17013..561f7c4199e4 100644
> --- a/drivers/gpu/drm/ast/Makefile
> +++ b/drivers/gpu/drm/ast/Makefile
> @@ -3,6 +3,6 @@
>  # Makefile for the drm device driver.  This driver provides support for the
>  # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
>  
> -ast-y := ast_drv.o ast_main.o ast_mode.o ast_fb.o ast_ttm.o ast_post.o 
> ast_dp501.o
> +ast-y := ast_drv.o ast_main.o ast_mode.o ast_ttm.o ast_post.o ast_dp501.o
>  
>  obj-$(CONFIG_DRM_AST) := ast.o
> diff --git a/drivers/gpu/drm/ast/ast_drv.c b/drivers/gpu/drm/ast/ast_drv.c
> index 3811997e78c4..75f55ac1a218 100644
> --- a/drivers/gpu/drm/ast/ast_drv.c
> +++ b/drivers/gpu/drm/ast/ast_drv.c
> @@ -103,25 +103,29 @@ static int ast_drm_freeze(struct drm_device *dev)
>  
>   pci_save_state(dev->pdev);
>  
> - console_lock();
> - ast_fbdev_set_suspend(dev, 1);
> - console_unlock();
> + if (dev->fb_helper) {
> + console_lock();
> + drm_fb_helper_set_suspend(dev->fb_helper, 1);
> + console_unlock();
> + }

You can call drm_fb_helper_set_suspend_unlocked() unconditionally here
and it will do the NULL check and locking for you.

> +
>   return 0;
>  }
>  
>  static int ast_drm_thaw(struct drm_device *dev)
>  {
> - int error = 0;
> -
>   ast_post_gpu(dev);
>  
>   drm_mode_config_reset(dev);
>   drm_helper_resume_force_mode(dev);
>  
> - console_lock();
> - ast_fbdev_set_suspend(dev, 0);
> - console_unlock();
> - return error;
> + if (dev->fb_helper) {
> + console_lock();
> + drm_fb_helper_set_suspend(dev->fb_helper, 0);
> + console_unlock();
> + }

Same here.

With that:

Acked-by: Noralf Trønnes 

> +
> + return 0;
>  }
>  
>  static int ast_drm_resume(struct drm_device *dev)
> diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h
> index ca794a3fcf8f..907d7480b7ba 100644
> --- a/drivers/gpu/drm/ast/ast_drv.h
> +++ b/drivers/gpu/drm/ast/ast_drv.h
> @@ -81,8 +81,6 @@ enum ast_tx_chip {
>  #define AST_DRAM_4Gx16   7
>  #define AST_DRAM_8Gx16   8
>  
> -struct ast_fbdev;
> -
>  struct ast_private {
>   struct drm_device *dev;
>  
> @@ -96,8 +94,6 @@ struct ast_private {
>   uint32_t mclk;
>   uint32_t vram_size;
>  
> - struct ast_fbdev *fbdev;
> -
>   int fb_mtrr;
>  
>   struct drm_gem_object *cursor_cache;
> @@ -239,14 +235,6 @@ struct ast_encoder {
>   struct drm_encoder base;
>  };
>  
> -struct ast_fbdev {
> - struct drm_fb_helper helper; /* must be first */
> - void *sysram;
> - int size;
> - int x1, y1, x2, y2; /* dirty rect */
> - spinlock_t dirty_lock;
> -};
> -
>  #define to_ast_crtc(x) container_of(x, struct ast_crtc, base)
>  #define to_ast_connector(x) container_of(x, struct ast_connector, base)
>  #define to_ast_encoder(x) container_of(x, struct ast_encoder, base)
> @@ -289,11 +277,6 @@ struct ast_vbios_mode_info {
>  extern int ast_mode_init(struct drm_device *dev);
>  extern void ast_mode_fini(struct drm_device *dev);
>  
> -int ast_fbdev_init(struct drm_device *dev);
> -void ast_fbdev_fini(struct drm_device *dev);
> -void ast_fbdev_set_suspend(struct drm_device *dev, int state);
> -void ast_fbdev_set_base(struct ast_private *ast, unsigned long gpu_addr);
> -
>  #define AST_MM_ALIGN_SHIFT 4
>  #define AST_MM_ALIGN_MASK ((1 << AST_MM_ALIGN_SHIFT) - 1)
>  
> diff --git a/drivers/gpu/drm/ast/ast_fb.c b/drivers/gpu/drm/ast/ast_fb.c
> deleted file mode 100644
> index a849e58b40bc..
> --- a/drivers/gpu/drm/ast/ast_fb.c
> +++ /dev/null
> @@ -1,341 +0,0 @@
> -/*
> - * Copyright 2012 Red Hat Inc.
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the
> - * "Software"), to deal in the Software without restriction, includin

Re: [PATCH 4/5] drm/bochs: Use shadow buffer for bochs framebuffer console

2019-07-03 Thread Noralf Trønnes


Den 03.07.2019 10.33, skrev Thomas Zimmermann:
> The bochs driver (and virtual hardware) requires buffer objects to
> reside in video ram to display them to the screen. So it can not
> display the framebuffer console because the respective buffer object
> is permanently pinned in system memory.
> 
> Using a shadow buffer for the console solves this problem. The console
> emulation will pin the buffer object only during updates from the shadow
> buffer. Otherwise, the bochs driver can freely relocated the buffer
> between system memory and video ram.
> 
> Signed-off-by: Thomas Zimmermann 
> ---

Acked-by: Noralf Trønnes 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/5] drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation

2019-07-03 Thread Noralf Trønnes


Den 03.07.2019 10.33, skrev Thomas Zimmermann:
> This patch replaces mgag200's framebuffer console with DRM's generic
> implememtation. All respective code is being removed from the driver.
> 
> The console is set up with a shadow buffer. The actual buffer object is
> not permanently pinned in video ram, but just another buffer object that
> the driver moves in and out of vram as necessary. The driver's function
> mga_crtc_do_set_base() used to contain special handling for the framebuffer
> console. With the new generic framebuffer, the driver does not need this
> code an longer.
> 
> Signed-off-by: Thomas Zimmermann 
> ---



> diff --git a/drivers/gpu/drm/mgag200/mgag200_main.c 
> b/drivers/gpu/drm/mgag200/mgag200_main.c
> index b10f7265b5c4..6d943a008752 100644
> --- a/drivers/gpu/drm/mgag200/mgag200_main.c
> +++ b/drivers/gpu/drm/mgag200/mgag200_main.c
> @@ -14,8 +14,33 @@
>  
>  #include "mgag200_drv.h"
>  
> +static int mga_framebuffer_dirtyfb(struct drm_framebuffer *fb,
> +struct drm_file *file_priv,
> +unsigned int flags,
> +unsigned int color,
> +struct drm_clip_rect *clips,
> +unsigned int num_clips)
> +{
> + /* empty placeholder function to enable fbcon shadow buffer */
> + return 0;
> +}
> +
> +static const struct drm_framebuffer_funcs mga_framebuffer_funcs = {
> + .destroy= drm_gem_fb_destroy,
> + .create_handle  = drm_gem_fb_create_handle,
> + .dirty  = mga_framebuffer_dirtyfb,
> +};
> +
> +static struct drm_framebuffer *
> +mga_mode_config_funcs_fb_create(struct drm_device *dev, struct drm_file 
> *file,
> + const struct drm_mode_fb_cmd2 *mode_cmd)
> +{
> + return drm_gem_fb_create_with_funcs(dev, file, mode_cmd,
> + &mga_framebuffer_funcs);
> +}
> +
>  static const struct drm_mode_config_funcs mga_mode_funcs = {
> - .fb_create = drm_gem_fb_create
> + .fb_create = mga_mode_config_funcs_fb_create
>  };
>  
>  static int mga_probe_vram(struct mga_device *mdev, void __iomem *mem)
> @@ -162,7 +187,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned 
> long flags)
>   if (IS_G200_SE(mdev) && mdev->mc.vram_size < (2048*1024))
>   dev->mode_config.preferred_depth = 16;
>   else
> - dev->mode_config.preferred_depth = 24;
> + dev->mode_config.preferred_depth = 32;

Please mention this change and the reason for it in the commit message.

>   dev->mode_config.prefer_shadow = 1;
>  
>   r = mgag200_modeset_init(mdev);
> @@ -186,6 +211,13 @@ int mgag200_driver_load(struct drm_device *dev, unsigned 
> long flags)
>   }
>   mdev->cursor.pixels_current = NULL;
>  
> + r = drm_fbdev_generic_setup(mdev->dev, 0);
> + if (r) {
> + dev_err(&dev->pdev->dev,
> + "drm_fbdev_generic_setup failed: %d\n", r);

drm_fbdev_generic_setup() will print an error message on failure so you
don't need to do it.

Acked-by: Noralf Trønnes 

> + goto err_modeset;
> + }
> +
>   return 0;
>  
>  err_modeset:
> @@ -204,32 +236,7 @@ void mgag200_driver_unload(struct drm_device *dev)
>   if (mdev == NULL)
>   return;
>   mgag200_modeset_fini(mdev);
> - mgag200_fbdev_fini(mdev);
>   drm_mode_config_cleanup(dev);
>   mgag200_mm_fini(mdev);
>   dev->dev_private = NULL;
>  }
> -
> -int mgag200_gem_create(struct drm_device *dev,
> -u32 size, bool iskernel,
> -struct drm_gem_object **obj)
> -{
> - struct drm_gem_vram_object *gbo;
> - int ret;
> -
> - *obj = NULL;
> -
> - size = roundup(size, PAGE_SIZE);
> - if (size == 0)
> - return -EINVAL;
> -
> - gbo = drm_gem_vram_create(dev, &dev->vram_mm->bdev, size, 0, false);
> - if (IS_ERR(gbo)) {
> - ret = PTR_ERR(gbo);
> - if (ret != -ERESTARTSYS)
> - DRM_ERROR("failed to allocate GEM object\n");
> - return ret;
> - }
> - *obj = &gbo->gem;
> - return 0;
> -}
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

2019-07-03 Thread Nadav Amit via Virtualization
> On Jul 3, 2019, at 7:04 AM, Juergen Gross  wrote:
> 
> On 03.07.19 01:51, Nadav Amit wrote:
>> To improve TLB shootdown performance, flush the remote and local TLBs
>> concurrently. Introduce flush_tlb_multi() that does so. Introduce
>> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
>> and hyper-v are only compile-tested).
>> While the updated smp infrastructure is capable of running a function on
>> a single local core, it is not optimized for this case. The multiple
>> function calls and the indirect branch introduce some overhead, and
>> might make local TLB flushes slower than they were before the recent
>> changes.
>> Before calling the SMP infrastructure, check if only a local TLB flush
>> is needed to restore the lost performance in this common case. This
>> requires to check mm_cpumask() one more time, but unless this mask is
>> updated very frequently, this should impact performance negatively.
>> Cc: "K. Y. Srinivasan" 
>> Cc: Haiyang Zhang 
>> Cc: Stephen Hemminger 
>> Cc: Sasha Levin 
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: x...@kernel.org
>> Cc: Juergen Gross 
>> Cc: Paolo Bonzini 
>> Cc: Dave Hansen 
>> Cc: Andy Lutomirski 
>> Cc: Peter Zijlstra 
>> Cc: Boris Ostrovsky 
>> Cc: linux-hyp...@vger.kernel.org
>> Cc: linux-ker...@vger.kernel.org
>> Cc: virtualization@lists.linux-foundation.org
>> Cc: k...@vger.kernel.org
>> Cc: xen-de...@lists.xenproject.org
>> Signed-off-by: Nadav Amit 
>> ---
>>  arch/x86/hyperv/mmu.c | 13 +++---
>>  arch/x86/include/asm/paravirt.h   |  6 +--
>>  arch/x86/include/asm/paravirt_types.h |  4 +-
>>  arch/x86/include/asm/tlbflush.h   |  9 ++--
>>  arch/x86/include/asm/trace/hyperv.h   |  2 +-
>>  arch/x86/kernel/kvm.c | 11 +++--
>>  arch/x86/kernel/paravirt.c|  2 +-
>>  arch/x86/mm/tlb.c | 65 ---
>>  arch/x86/xen/mmu_pv.c | 20 ++---
>>  include/trace/events/xen.h|  2 +-
>>  10 files changed, 91 insertions(+), 43 deletions(-)
> 
> ...
> 
>> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
>> index beb44e22afdf..19e481e6e904 100644
>> --- a/arch/x86/xen/mmu_pv.c
>> +++ b/arch/x86/xen/mmu_pv.c
>> @@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long addr)
>>  preempt_enable();
>>  }
>>  -static void xen_flush_tlb_others(const struct cpumask *cpus,
>> - const struct flush_tlb_info *info)
>> +static void xen_flush_tlb_multi(const struct cpumask *cpus,
>> +const struct flush_tlb_info *info)
>>  {
>>  struct {
>>  struct mmuext_op op;
>> @@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct cpumask 
>> *cpus,
>>  const size_t mc_entry_size = sizeof(args->op) +
>>  sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());
>>  -   trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);
>> +trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);
>>  if (cpumask_empty(cpus))
>>  return; /* nothing to do */
>> @@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct cpumask 
>> *cpus,
>>  args = mcs.args;
>>  args->op.arg2.vcpumask = to_cpumask(args->mask);
>>  -   /* Remove us, and any offline CPUS. */
>> +/* Flush locally if needed and remove us */
>> +if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
>> +local_irq_disable();
>> +flush_tlb_func_local(info);
> 
> I think this isn't the correct function for PV guests.
> 
> In fact it should be much easier: just don't clear the own cpu from the
> mask, that's all what's needed. The hypervisor is just fine having the
> current cpu in the mask and it will do the right thing.

Thanks. I will do so in v3. I don’t think Hyper-V people would want to do
the same, unfortunately, since it would induce VM-exit on TLB flushes. But
if they do - I’ll be able not to expose flush_tlb_func_local().

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

2019-07-03 Thread Andrew Cooper
On 03/07/2019 18:02, Nadav Amit wrote:
>> On Jul 3, 2019, at 7:04 AM, Juergen Gross  wrote:
>>
>> On 03.07.19 01:51, Nadav Amit wrote:
>>> To improve TLB shootdown performance, flush the remote and local TLBs
>>> concurrently. Introduce flush_tlb_multi() that does so. Introduce
>>> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
>>> and hyper-v are only compile-tested).
>>> While the updated smp infrastructure is capable of running a function on
>>> a single local core, it is not optimized for this case. The multiple
>>> function calls and the indirect branch introduce some overhead, and
>>> might make local TLB flushes slower than they were before the recent
>>> changes.
>>> Before calling the SMP infrastructure, check if only a local TLB flush
>>> is needed to restore the lost performance in this common case. This
>>> requires to check mm_cpumask() one more time, but unless this mask is
>>> updated very frequently, this should impact performance negatively.
>>> Cc: "K. Y. Srinivasan" 
>>> Cc: Haiyang Zhang 
>>> Cc: Stephen Hemminger 
>>> Cc: Sasha Levin 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: Borislav Petkov 
>>> Cc: x...@kernel.org
>>> Cc: Juergen Gross 
>>> Cc: Paolo Bonzini 
>>> Cc: Dave Hansen 
>>> Cc: Andy Lutomirski 
>>> Cc: Peter Zijlstra 
>>> Cc: Boris Ostrovsky 
>>> Cc: linux-hyp...@vger.kernel.org
>>> Cc: linux-ker...@vger.kernel.org
>>> Cc: virtualization@lists.linux-foundation.org
>>> Cc: k...@vger.kernel.org
>>> Cc: xen-de...@lists.xenproject.org
>>> Signed-off-by: Nadav Amit 
>>> ---
>>>  arch/x86/hyperv/mmu.c | 13 +++---
>>>  arch/x86/include/asm/paravirt.h   |  6 +--
>>>  arch/x86/include/asm/paravirt_types.h |  4 +-
>>>  arch/x86/include/asm/tlbflush.h   |  9 ++--
>>>  arch/x86/include/asm/trace/hyperv.h   |  2 +-
>>>  arch/x86/kernel/kvm.c | 11 +++--
>>>  arch/x86/kernel/paravirt.c|  2 +-
>>>  arch/x86/mm/tlb.c | 65 ---
>>>  arch/x86/xen/mmu_pv.c | 20 ++---
>>>  include/trace/events/xen.h|  2 +-
>>>  10 files changed, 91 insertions(+), 43 deletions(-)
>> ...
>>
>>> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
>>> index beb44e22afdf..19e481e6e904 100644
>>> --- a/arch/x86/xen/mmu_pv.c
>>> +++ b/arch/x86/xen/mmu_pv.c
>>> @@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long addr)
>>> preempt_enable();
>>>  }
>>>  -static void xen_flush_tlb_others(const struct cpumask *cpus,
>>> -const struct flush_tlb_info *info)
>>> +static void xen_flush_tlb_multi(const struct cpumask *cpus,
>>> +   const struct flush_tlb_info *info)
>>>  {
>>> struct {
>>> struct mmuext_op op;
>>> @@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct cpumask 
>>> *cpus,
>>> const size_t mc_entry_size = sizeof(args->op) +
>>> sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());
>>>  -  trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);
>>> +   trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);
>>> if (cpumask_empty(cpus))
>>> return; /* nothing to do */
>>> @@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct 
>>> cpumask *cpus,
>>> args = mcs.args;
>>> args->op.arg2.vcpumask = to_cpumask(args->mask);
>>>  -  /* Remove us, and any offline CPUS. */
>>> +   /* Flush locally if needed and remove us */
>>> +   if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
>>> +   local_irq_disable();
>>> +   flush_tlb_func_local(info);
>> I think this isn't the correct function for PV guests.
>>
>> In fact it should be much easier: just don't clear the own cpu from the
>> mask, that's all what's needed. The hypervisor is just fine having the
>> current cpu in the mask and it will do the right thing.
> Thanks. I will do so in v3. I don’t think Hyper-V people would want to do
> the same, unfortunately, since it would induce VM-exit on TLB flushes.

Why do you believe the vmexit matters?  You're talking one anyway for
the IPI.

Intel only have virtualised self-IPI, and while AMD do have working
non-self IPIs, you still take a vmexit anyway if any destination vcpu
isn't currently running in non-root mode (IIRC).

At that point, you might as well have the hypervisor do all the hard
work via a multi-cpu shootdown/flush hypercall, rather than trying to
arrange it locally.

~Andrew
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 06/18] drm/virtio: remove ttm calls from in virtio_gpu_object_{reserve, unreserve}

2019-07-03 Thread Chia-I Wu
On Tue, Jul 2, 2019 at 7:19 AM Gerd Hoffmann  wrote:
>
> Call reservation_object_* directly instead
> of using ttm_bo_{reserve,unreserve}.
>
> v4: check for EINTR only.
> v3: check for EINTR too.
>
> Signed-off-by: Gerd Hoffmann 
> Reviewed-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 06cc0e961df6..07f6001ea91e 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -402,9 +402,9 @@ static inline int virtio_gpu_object_reserve(struct 
> virtio_gpu_object *bo)
>  {
> int r;
>
> -   r = ttm_bo_reserve(&bo->tbo, true, false, NULL);
> +   r = reservation_object_lock_interruptible(bo->gem_base.resv, NULL);
Can you elaborate a bit about how TTM keeps the BOs alive in, for
example, virtio_gpu_transfer_from_host_ioctl?  In that function, only
three TTM functions are called: ttm_bo_reserve, ttm_bo_validate, and
ttm_bo_unreserve.  I am curious how they keep the BO alive.

> if (unlikely(r != 0)) {
> -   if (r != -ERESTARTSYS) {
> +   if (r != -EINTR) {
> struct virtio_gpu_device *qdev =
> bo->gem_base.dev->dev_private;
> dev_err(qdev->dev, "%p reserve failed\n", bo);
> @@ -416,7 +416,7 @@ static inline int virtio_gpu_object_reserve(struct 
> virtio_gpu_object *bo)
>
>  static inline void virtio_gpu_object_unreserve(struct virtio_gpu_object *bo)
>  {
> -   ttm_bo_unreserve(&bo->tbo);
> +   reservation_object_unlock(bo->gem_base.resv);
>  }
>
>  /* virgl debufs */
> --
> 2.18.1
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

2019-07-03 Thread Nadav Amit via Virtualization
> On Jul 3, 2019, at 10:43 AM, Andrew Cooper  wrote:
> 
> On 03/07/2019 18:02, Nadav Amit wrote:
>>> On Jul 3, 2019, at 7:04 AM, Juergen Gross  wrote:
>>> 
>>> On 03.07.19 01:51, Nadav Amit wrote:
 To improve TLB shootdown performance, flush the remote and local TLBs
 concurrently. Introduce flush_tlb_multi() that does so. Introduce
 paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
 and hyper-v are only compile-tested).
 While the updated smp infrastructure is capable of running a function on
 a single local core, it is not optimized for this case. The multiple
 function calls and the indirect branch introduce some overhead, and
 might make local TLB flushes slower than they were before the recent
 changes.
 Before calling the SMP infrastructure, check if only a local TLB flush
 is needed to restore the lost performance in this common case. This
 requires to check mm_cpumask() one more time, but unless this mask is
 updated very frequently, this should impact performance negatively.
 Cc: "K. Y. Srinivasan" 
 Cc: Haiyang Zhang 
 Cc: Stephen Hemminger 
 Cc: Sasha Levin 
 Cc: Thomas Gleixner 
 Cc: Ingo Molnar 
 Cc: Borislav Petkov 
 Cc: x...@kernel.org
 Cc: Juergen Gross 
 Cc: Paolo Bonzini 
 Cc: Dave Hansen 
 Cc: Andy Lutomirski 
 Cc: Peter Zijlstra 
 Cc: Boris Ostrovsky 
 Cc: linux-hyp...@vger.kernel.org
 Cc: linux-ker...@vger.kernel.org
 Cc: virtualization@lists.linux-foundation.org
 Cc: k...@vger.kernel.org
 Cc: xen-de...@lists.xenproject.org
 Signed-off-by: Nadav Amit 
 ---
 arch/x86/hyperv/mmu.c | 13 +++---
 arch/x86/include/asm/paravirt.h   |  6 +--
 arch/x86/include/asm/paravirt_types.h |  4 +-
 arch/x86/include/asm/tlbflush.h   |  9 ++--
 arch/x86/include/asm/trace/hyperv.h   |  2 +-
 arch/x86/kernel/kvm.c | 11 +++--
 arch/x86/kernel/paravirt.c|  2 +-
 arch/x86/mm/tlb.c | 65 ---
 arch/x86/xen/mmu_pv.c | 20 ++---
 include/trace/events/xen.h|  2 +-
 10 files changed, 91 insertions(+), 43 deletions(-)
>>> ...
>>> 
 diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
 index beb44e22afdf..19e481e6e904 100644
 --- a/arch/x86/xen/mmu_pv.c
 +++ b/arch/x86/xen/mmu_pv.c
 @@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long 
 addr)
preempt_enable();
 }
 -static void xen_flush_tlb_others(const struct cpumask *cpus,
 -   const struct flush_tlb_info *info)
 +static void xen_flush_tlb_multi(const struct cpumask *cpus,
 +  const struct flush_tlb_info *info)
 {
struct {
struct mmuext_op op;
 @@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct 
 cpumask *cpus,
const size_t mc_entry_size = sizeof(args->op) +
sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());
 -  trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);
 +  trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);
if (cpumask_empty(cpus))
return; /* nothing to do */
 @@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct 
 cpumask *cpus,
args = mcs.args;
args->op.arg2.vcpumask = to_cpumask(args->mask);
 -  /* Remove us, and any offline CPUS. */
 +  /* Flush locally if needed and remove us */
 +  if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
 +  local_irq_disable();
 +  flush_tlb_func_local(info);
>>> I think this isn't the correct function for PV guests.
>>> 
>>> In fact it should be much easier: just don't clear the own cpu from the
>>> mask, that's all what's needed. The hypervisor is just fine having the
>>> current cpu in the mask and it will do the right thing.
>> Thanks. I will do so in v3. I don’t think Hyper-V people would want to do
>> the same, unfortunately, since it would induce VM-exit on TLB flushes.
> 
> Why do you believe the vmexit matters?  You're talking one anyway for
> the IPI.
> 
> Intel only have virtualised self-IPI, and while AMD do have working
> non-self IPIs, you still take a vmexit anyway if any destination vcpu
> isn't currently running in non-root mode (IIRC).
> 
> At that point, you might as well have the hypervisor do all the hard
> work via a multi-cpu shootdown/flush hypercall, rather than trying to
> arrange it locally.

I forgot that xen_flush_tlb_multi() should actually only be called when
there are some remote CPUs (as I optimized the case in which there is only a
single local CPU that needs to be flushed), so you are right.

___
Virtualization mailing list
Virtualization@lists.lin

Re: [PATCH v6 07/18] drm/virtio: add virtio_gpu_object_array & helpers

2019-07-03 Thread Chia-I Wu
On Tue, Jul 2, 2019 at 7:19 AM Gerd Hoffmann  wrote:
>
> Some helper functions to manage an array of gem objects.
>
> v6:
>  - add ticket to struct virtio_gpu_object_array.
>  - add virtio_gpu_array_{lock,unlock}_resv helpers.
>  - add virtio_gpu_array_add_fence helper.
> v5: some small optimizations (Chia-I Wu).
> v4: make them virtio-private instead of generic helpers.
>
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h | 17 ++
>  drivers/gpu/drm/virtio/virtgpu_gem.c | 83 
>  2 files changed, 100 insertions(+)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 07f6001ea91e..abb078a5dedf 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -84,6 +84,12 @@ struct virtio_gpu_object {
>  #define gem_to_virtio_gpu_obj(gobj) \
> container_of((gobj), struct virtio_gpu_object, gem_base)
>
> +struct virtio_gpu_object_array {
> +   struct ww_acquire_ctx ticket;
> +   u32 nents, total;
> +   struct drm_gem_object *objs[];
> +};
> +
>  struct virtio_gpu_vbuffer;
>  struct virtio_gpu_device;
>
> @@ -251,6 +257,17 @@ int virtio_gpu_mode_dumb_mmap(struct drm_file *file_priv,
>   struct drm_device *dev,
>   uint32_t handle, uint64_t *offset_p);
>
> +struct virtio_gpu_object_array *virtio_gpu_array_alloc(u32 nents);
> +struct virtio_gpu_object_array*
> +virtio_gpu_array_from_handles(struct drm_file *drm_file, u32 *handles, u32 
> nents);
> +void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs,
> + struct drm_gem_object *obj);
> +int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs);
> +void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs);
> +void virtio_gpu_array_add_fence(struct virtio_gpu_object_array *objs,
> +   struct dma_fence *fence);
> +void virtio_gpu_array_put_free(struct virtio_gpu_object_array *objs);
> +
>  /* virtio vg */
>  int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev);
>  void virtio_gpu_free_vbufs(struct virtio_gpu_device *vgdev);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c 
> b/drivers/gpu/drm/virtio/virtgpu_gem.c
> index 9c9ad3b14080..e88df5e06d06 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_gem.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
> @@ -169,3 +169,86 @@ void virtio_gpu_gem_object_close(struct drm_gem_object 
> *obj,
> qobj->hw_res_handle);
> virtio_gpu_object_unreserve(qobj);
>  }
> +
> +struct virtio_gpu_object_array *virtio_gpu_array_alloc(u32 nents)
> +{
> +   struct virtio_gpu_object_array *objs;
> +   size_t size = sizeof(*objs) + sizeof(objs->objs[0]) * nents;
> +
> +   objs = kmalloc(size, GFP_KERNEL);
> +   if (!objs)
> +   return NULL;
> +
> +   objs->nents = 0;
> +   objs->total = nents;
> +   return objs;
> +}
> +
> +static void virtio_gpu_array_free(struct virtio_gpu_object_array *objs)
> +{
> +   kfree(objs);
> +}
> +
> +struct virtio_gpu_object_array*
> +virtio_gpu_array_from_handles(struct drm_file *drm_file, u32 *handles, u32 
> nents)
> +{
> +   struct virtio_gpu_object_array *objs;
> +   u32 i;
> +
> +   objs = virtio_gpu_array_alloc(nents);
> +   if (!objs)
> +   return NULL;
> +
> +   for (i = 0; i < nents; i++) {
> +   objs->nents = i;
This line can be moved into the if-block just below.
> +   objs->objs[i] = drm_gem_object_lookup(drm_file, handles[i]);
> +   if (!objs->objs[i]) {
> +   virtio_gpu_array_put_free(objs);
> +   return NULL;
> +   }
> +   }
> +   objs->nents = i;
> +   return objs;
> +}
> +
> +void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs,
> + struct drm_gem_object *obj)
> +{
> +   if (WARN_ON_ONCE(objs->nents == objs->total))
> +   return;
> +
> +   drm_gem_object_get(obj);
> +   objs->objs[objs->nents] = obj;
> +   objs->nents++;
> +}
> +
> +int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs)
> +{
> +   return drm_gem_lock_reservations(objs->objs, objs->nents,
> +&objs->ticket);
> +}
> +
> +void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs)
> +{
> +   drm_gem_unlock_reservations(objs->objs, objs->nents,
> +   &objs->ticket);
> +}
> +
> +void virtio_gpu_array_add_fence(struct virtio_gpu_object_array *objs,
> +   struct dma_fence *fence)
> +{
> +   int i;
> +
> +   for (i = 0; i < objs->nents; i++)
> +   reservation_object_add_excl_fence(objs->objs[i]->resv,
> + fence);
> +}
> +
> +void 

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Alex Williamson
On Wed,  3 Jul 2019 17:13:39 +0800
Tiwei Bie  wrote:
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 8f10748dac79..6c5718ab7eeb 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -201,6 +201,7 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3) /* vfio-amba device */
>  #define VFIO_DEVICE_FLAGS_CCW(1 << 4)/* vfio-ccw device */
>  #define VFIO_DEVICE_FLAGS_AP (1 << 5)/* vfio-ap device */
> +#define VFIO_DEVICE_FLAGS_VHOST  (1 << 6)/* vfio-vhost device */
>   __u32   num_regions;/* Max region index + 1 */
>   __u32   num_irqs;   /* Max IRQ index + 1 */
>  };
> @@ -217,6 +218,7 @@ struct vfio_device_info {
>  #define VFIO_DEVICE_API_AMBA_STRING  "vfio-amba"
>  #define VFIO_DEVICE_API_CCW_STRING   "vfio-ccw"
>  #define VFIO_DEVICE_API_AP_STRING"vfio-ap"
> +#define VFIO_DEVICE_API_VHOST_STRING "vfio-vhost"
>  
>  /**
>   * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
> @@ -573,6 +575,23 @@ enum {
>   VFIO_CCW_NUM_IRQS
>  };
>  
> +/*
> + * The vfio-vhost bus driver makes use of the following fixed region and
> + * IRQ index mapping. Unimplemented regions return a size of zero.
> + * Unimplemented IRQ types return a count of zero.
> + */
> +
> +enum {
> + VFIO_VHOST_CONFIG_REGION_INDEX,
> + VFIO_VHOST_NOTIFY_REGION_INDEX,
> + VFIO_VHOST_NUM_REGIONS
> +};
> +
> +enum {
> + VFIO_VHOST_VQ_IRQ_INDEX,
> + VFIO_VHOST_NUM_IRQS
> +};
> +

Note that the vfio API has evolved a bit since vfio-pci started this
way, with fixed indexes for pre-defined region types.  We now support
device specific regions which can be identified by a capability within
the REGION_INFO ioctl return data.  This allows a bit more flexibility,
at the cost of complexity, but the infrastructure already exists in
kernel and QEMU to make it relatively easy.  I think we'll have the
same support for interrupts soon too.  If you continue to pursue the
vfio-vhost direction you might want to consider these before committing
to fixed indexes.  Thanks,

Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 08/18] drm/virtio: rework virtio_gpu_execbuffer_ioctl fencing

2019-07-03 Thread Chia-I Wu
On Tue, Jul 2, 2019 at 7:19 AM Gerd Hoffmann  wrote:
>
> Rework fencing workflow, starting with virtio_gpu_execbuffer_ioctl.
> Stop using ttm helpers, use the virtio_gpu_array_* helpers (which work
> on the reservation objects directly) instead.
>
> New workflow:
>
>  (1) All gem objects needed by a command are added to a
>  virtio_gpu_object_array.
>  (2) All reservation objects will be locked (virtio_gpu_array_lock_resv).
>  (3) virtio_gpu_fence_emit() completes fence initialization.
>  (4) fence gets added to the objects, reservation objects are unlocked
>  (virtio_gpu_array_add_fence, virtio_gpu_array_unlock_resv).
>  (5) virtio command is submitted to the host.
>  (6) The completion callback (virtio_gpu_dequeue_ctrl_func)
>  will drop object references and free virtio_gpu_object_array.
>
> v6: rewrite most of the patch.
>
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h   |  6 ++-
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 56 +-
>  drivers/gpu/drm/virtio/virtgpu_vq.c| 21 +++---
>  3 files changed, 38 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index abb078a5dedf..98511d1dfff2 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -121,9 +121,9 @@ struct virtio_gpu_vbuffer {
>
> char *resp_buf;
> int resp_size;
> -
> virtio_gpu_resp_cb resp_cb;
>
> +   struct virtio_gpu_object_array *objs;
> struct list_head list;
>  };
>
> @@ -318,7 +318,9 @@ void virtio_gpu_cmd_context_detach_resource(struct 
> virtio_gpu_device *vgdev,
> uint32_t resource_id);
>  void virtio_gpu_cmd_submit(struct virtio_gpu_device *vgdev,
>void *data, uint32_t data_size,
> -  uint32_t ctx_id, struct virtio_gpu_fence *fence);
> +  uint32_t ctx_id,
> +  struct virtio_gpu_object_array *objs,
> +  struct virtio_gpu_fence *fence);
>  void virtio_gpu_cmd_transfer_from_host_3d(struct virtio_gpu_device *vgdev,
>   uint32_t resource_id, uint32_t 
> ctx_id,
>   uint64_t offset, uint32_t level,
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 0caff3fa623e..9735d7e5899b 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -105,16 +105,11 @@ static int virtio_gpu_execbuffer_ioctl(struct 
> drm_device *dev, void *data,
> struct drm_virtgpu_execbuffer *exbuf = data;
> struct virtio_gpu_device *vgdev = dev->dev_private;
> struct virtio_gpu_fpriv *vfpriv = drm_file->driver_priv;
> -   struct drm_gem_object *gobj;
> struct virtio_gpu_fence *out_fence;
> -   struct virtio_gpu_object *qobj;
> int ret;
> uint32_t *bo_handles = NULL;
> void __user *user_bo_handles = NULL;
> -   struct list_head validate_list;
> -   struct ttm_validate_buffer *buflist = NULL;
> -   int i;
> -   struct ww_acquire_ctx ticket;
> +   struct virtio_gpu_object_array *buflist = NULL;
> struct sync_file *sync_file;
> int in_fence_fd = exbuf->fence_fd;
> int out_fence_fd = -1;
> @@ -155,15 +150,10 @@ static int virtio_gpu_execbuffer_ioctl(struct 
> drm_device *dev, void *data,
> return out_fence_fd;
> }
>
> -   INIT_LIST_HEAD(&validate_list);
> if (exbuf->num_bo_handles) {
> -
> bo_handles = kvmalloc_array(exbuf->num_bo_handles,
> -  sizeof(uint32_t), GFP_KERNEL);
> -   buflist = kvmalloc_array(exbuf->num_bo_handles,
> -  sizeof(struct ttm_validate_buffer),
> -  GFP_KERNEL | __GFP_ZERO);
> -   if (!bo_handles || !buflist) {
> +   sizeof(uint32_t), GFP_KERNEL);
> +   if (!bo_handles) {
> ret = -ENOMEM;
> goto out_unused_fd;
> }
> @@ -175,25 +165,21 @@ static int virtio_gpu_execbuffer_ioctl(struct 
> drm_device *dev, void *data,
> goto out_unused_fd;
> }
>
> -   for (i = 0; i < exbuf->num_bo_handles; i++) {
> -   gobj = drm_gem_object_lookup(drm_file, bo_handles[i]);
> -   if (!gobj) {
> -   ret = -ENOENT;
> -   goto out_unused_fd;
> -   }
> -
> -   qobj = gem_to_virtio_gpu_obj(gobj);
> -   buflist[i].bo = &qobj->tbo;
> -
> -   list_add(&buflist[i].he

Re: [PATCH v2 06/35] crypto: Use kmemdup rather than duplicating its implementation

2019-07-03 Thread Michael S. Tsirkin
On Thu, Jul 04, 2019 at 12:27:08AM +0800, Fuqian Huang wrote:
> kmemdup is introduced to duplicate a region of memory in a neat way.
> Rather than kmalloc/kzalloc + memcpy, which the programmer needs to
> write the size twice (sometimes lead to mistakes), kmemdup improves
> readability, leads to smaller code and also reduce the chances of mistakes.
> Suggestion to use kmemdup rather than using kmalloc/kzalloc + memcpy.
> 
> Signed-off-by: Fuqian Huang 

virtio part:

Acked-by: Michael S. Tsirkin 


> ---
> Changes in v2:
>   - Fix a typo in commit message (memset -> memcpy)
> 
>  drivers/crypto/caam/caampkc.c  | 11 +++
>  drivers/crypto/virtio/virtio_crypto_algs.c |  4 +---
>  2 files changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/crypto/caam/caampkc.c b/drivers/crypto/caam/caampkc.c
> index fe24485274e1..a03464b4c019 100644
> --- a/drivers/crypto/caam/caampkc.c
> +++ b/drivers/crypto/caam/caampkc.c
> @@ -816,7 +816,7 @@ static int caam_rsa_set_pub_key(struct crypto_akcipher 
> *tfm, const void *key,
>   return ret;
>  
>   /* Copy key in DMA zone */
> - rsa_key->e = kzalloc(raw_key.e_sz, GFP_DMA | GFP_KERNEL);
> + rsa_key->e = kmemdup(raw_key.e, raw_key.e_sz, GFP_DMA | GFP_KERNEL);
>   if (!rsa_key->e)
>   goto err;
>  
> @@ -838,8 +838,6 @@ static int caam_rsa_set_pub_key(struct crypto_akcipher 
> *tfm, const void *key,
>   rsa_key->e_sz = raw_key.e_sz;
>   rsa_key->n_sz = raw_key.n_sz;
>  
> - memcpy(rsa_key->e, raw_key.e, raw_key.e_sz);
> -
>   return 0;
>  err:
>   caam_rsa_free_key(rsa_key);
> @@ -920,11 +918,11 @@ static int caam_rsa_set_priv_key(struct crypto_akcipher 
> *tfm, const void *key,
>   return ret;
>  
>   /* Copy key in DMA zone */
> - rsa_key->d = kzalloc(raw_key.d_sz, GFP_DMA | GFP_KERNEL);
> + rsa_key->d = kmemdup(raw_key.d, raw_key.d_sz, GFP_DMA | GFP_KERNEL);
>   if (!rsa_key->d)
>   goto err;
>  
> - rsa_key->e = kzalloc(raw_key.e_sz, GFP_DMA | GFP_KERNEL);
> + rsa_key->e = kmemdup(raw_key.e, raw_key.e_sz, GFP_DMA | GFP_KERNEL);
>   if (!rsa_key->e)
>   goto err;
>  
> @@ -947,9 +945,6 @@ static int caam_rsa_set_priv_key(struct crypto_akcipher 
> *tfm, const void *key,
>   rsa_key->e_sz = raw_key.e_sz;
>   rsa_key->n_sz = raw_key.n_sz;
>  
> - memcpy(rsa_key->d, raw_key.d, raw_key.d_sz);
> - memcpy(rsa_key->e, raw_key.e, raw_key.e_sz);
> -
>   caam_rsa_set_priv_key_form(ctx, &raw_key);
>  
>   return 0;
> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
> b/drivers/crypto/virtio/virtio_crypto_algs.c
> index 10f266d462d6..42d19205166b 100644
> --- a/drivers/crypto/virtio/virtio_crypto_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> @@ -129,13 +129,11 @@ static int virtio_crypto_alg_ablkcipher_init_session(
>* Avoid to do DMA from the stack, switch to using
>* dynamically-allocated for the key
>*/
> - uint8_t *cipher_key = kmalloc(keylen, GFP_ATOMIC);
> + uint8_t *cipher_key = kmemdup(key, keylen, GFP_ATOMIC);
>  
>   if (!cipher_key)
>   return -ENOMEM;
>  
> - memcpy(cipher_key, key, keylen);
> -
>   spin_lock(&vcrypto->ctrl_lock);
>   /* Pad ctrl header */
>   vcrypto->ctrl.header.opcode =
> -- 
> 2.11.0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/5] Unmappable DRM client buffers for fbdev emulation

2019-07-03 Thread Noralf Trønnes



Den 03.07.2019 10.32, skrev Thomas Zimmermann:
> DRM client buffers are permanently mapped throughout their lifetime. This
> prevents us from using generic framebuffer emulation for devices with
> small dedicated video memory, such as ast or mgag200. With fb buffers
> permanently mapped, such devices often won't have enougth space left to
> display other content (e.g., X11).
> 
> This patch set introduces unmappable DRM client buffers for framebuffer
> emulation with shadow buffers. While the shadow buffer remains in system
> memory permanently, the respective buffer object will only be mapped briefly
> during updates from the shadow buffer. Hence, the driver can relocate he
> buffer object among memory regions as needed.
> 
> The default behoviour for DRM client buffers is still to be permanently
> mapped.
> 
> The patch set converts ast and mgag200 to generic framebuffer emulation
> and removes a large amount of framebuffer code from these drivers. For
> bochs, a problem was reported where the driver could not display the console
> because it was pinned in system memory. [1] The patch set fixes this bug
> by converting bochs to use the shadow fb.
> 
> The patch set has been tested on ast and mga200 HW.
> 

I've been thinking, this enables dirty tracking in DRM userspace for
these drivers since the dirty callback is now set on all framebuffers.
Is this OK? Should we add a drm_fbdev_generic_shadow_setup() that sets a
flag enabling the shadow buffer instead?

Really nice diffstat by the way :-)

Noralf.

> [1] https://lists.freedesktop.org/archives/dri-devel/2019-June/224423.html
> 
> Thomas Zimmermann (5):
>   drm/client: Support unmapping of DRM client buffers
>   drm/fb-helper: Unmap BO for shadow-buffered framebuffer console
>   drm/ast: Replace struct ast_fbdev with generic framebuffer emulation
>   drm/bochs: Use shadow buffer for bochs framebuffer console
>   drm/mgag200: Replace struct mga_fbdev with generic framebuffer
> emulation
> 
>  drivers/gpu/drm/ast/Makefile   |   2 +-
>  drivers/gpu/drm/ast/ast_drv.c  |  22 +-
>  drivers/gpu/drm/ast/ast_drv.h  |  17 --
>  drivers/gpu/drm/ast/ast_fb.c   | 341 -
>  drivers/gpu/drm/ast/ast_main.c |  30 ++-
>  drivers/gpu/drm/ast/ast_mode.c |  21 --
>  drivers/gpu/drm/bochs/bochs_kms.c  |   2 +-
>  drivers/gpu/drm/drm_client.c   |  71 -
>  drivers/gpu/drm/drm_fb_helper.c|  14 +-
>  drivers/gpu/drm/mgag200/Makefile   |   2 +-
>  drivers/gpu/drm/mgag200/mgag200_drv.h  |  19 --
>  drivers/gpu/drm/mgag200/mgag200_fb.c   | 309 --
>  drivers/gpu/drm/mgag200/mgag200_main.c |  61 +++--
>  drivers/gpu/drm/mgag200/mgag200_mode.c |  27 --
>  include/drm/drm_client.h   |   3 +
>  15 files changed, 154 insertions(+), 787 deletions(-)
>  delete mode 100644 drivers/gpu/drm/ast/ast_fb.c
>  delete mode 100644 drivers/gpu/drm/mgag200/mgag200_fb.c
> 
> --
> 2.21.0
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 07/18] drm/virtio: add virtio_gpu_object_array & helpers

2019-07-03 Thread Chia-I Wu
On Wed, Jul 3, 2019 at 11:31 AM Chia-I Wu  wrote:
>
> On Tue, Jul 2, 2019 at 7:19 AM Gerd Hoffmann  wrote:
> >
> > Some helper functions to manage an array of gem objects.
> >
> > v6:
> >  - add ticket to struct virtio_gpu_object_array.
> >  - add virtio_gpu_array_{lock,unlock}_resv helpers.
> >  - add virtio_gpu_array_add_fence helper.
> > v5: some small optimizations (Chia-I Wu).
> > v4: make them virtio-private instead of generic helpers.
> >
> > Signed-off-by: Gerd Hoffmann 
> > ---
> >  drivers/gpu/drm/virtio/virtgpu_drv.h | 17 ++
> >  drivers/gpu/drm/virtio/virtgpu_gem.c | 83 
> >  2 files changed, 100 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> > b/drivers/gpu/drm/virtio/virtgpu_drv.h
> > index 07f6001ea91e..abb078a5dedf 100644
> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> > @@ -84,6 +84,12 @@ struct virtio_gpu_object {
> >  #define gem_to_virtio_gpu_obj(gobj) \
> > container_of((gobj), struct virtio_gpu_object, gem_base)
> >
> > +struct virtio_gpu_object_array {
> > +   struct ww_acquire_ctx ticket;
> > +   u32 nents, total;
> > +   struct drm_gem_object *objs[];
> > +};
> > +
> >  struct virtio_gpu_vbuffer;
> >  struct virtio_gpu_device;
> >
> > @@ -251,6 +257,17 @@ int virtio_gpu_mode_dumb_mmap(struct drm_file 
> > *file_priv,
> >   struct drm_device *dev,
> >   uint32_t handle, uint64_t *offset_p);
> >
> > +struct virtio_gpu_object_array *virtio_gpu_array_alloc(u32 nents);
> > +struct virtio_gpu_object_array*
> > +virtio_gpu_array_from_handles(struct drm_file *drm_file, u32 *handles, u32 
> > nents);
> > +void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs,
> > + struct drm_gem_object *obj);
> > +int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs);
> > +void virtio_gpu_array_unlock_resv(struct virtio_gpu_object_array *objs);
> > +void virtio_gpu_array_add_fence(struct virtio_gpu_object_array *objs,
> > +   struct dma_fence *fence);
> > +void virtio_gpu_array_put_free(struct virtio_gpu_object_array *objs);
> > +
> >  /* virtio vg */
> >  int virtio_gpu_alloc_vbufs(struct virtio_gpu_device *vgdev);
> >  void virtio_gpu_free_vbufs(struct virtio_gpu_device *vgdev);
> > diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c 
> > b/drivers/gpu/drm/virtio/virtgpu_gem.c
> > index 9c9ad3b14080..e88df5e06d06 100644
> > --- a/drivers/gpu/drm/virtio/virtgpu_gem.c
> > +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
> > @@ -169,3 +169,86 @@ void virtio_gpu_gem_object_close(struct drm_gem_object 
> > *obj,
> > qobj->hw_res_handle);
> > virtio_gpu_object_unreserve(qobj);
> >  }
> > +
> > +struct virtio_gpu_object_array *virtio_gpu_array_alloc(u32 nents)
> > +{
> > +   struct virtio_gpu_object_array *objs;
> > +   size_t size = sizeof(*objs) + sizeof(objs->objs[0]) * nents;
> > +
> > +   objs = kmalloc(size, GFP_KERNEL);
> > +   if (!objs)
> > +   return NULL;
> > +
> > +   objs->nents = 0;
> > +   objs->total = nents;
> > +   return objs;
> > +}
> > +
> > +static void virtio_gpu_array_free(struct virtio_gpu_object_array *objs)
> > +{
> > +   kfree(objs);
> > +}
> > +
> > +struct virtio_gpu_object_array*
> > +virtio_gpu_array_from_handles(struct drm_file *drm_file, u32 *handles, u32 
> > nents)
> > +{
> > +   struct virtio_gpu_object_array *objs;
> > +   u32 i;
> > +
> > +   objs = virtio_gpu_array_alloc(nents);
> > +   if (!objs)
> > +   return NULL;
> > +
> > +   for (i = 0; i < nents; i++) {
> > +   objs->nents = i;
> This line can be moved into the if-block just below.
> > +   objs->objs[i] = drm_gem_object_lookup(drm_file, handles[i]);
> > +   if (!objs->objs[i]) {
> > +   virtio_gpu_array_put_free(objs);
> > +   return NULL;
> > +   }
> > +   }
> > +   objs->nents = i;
> > +   return objs;
> > +}
> > +
> > +void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs,
> > + struct drm_gem_object *obj)
> > +{
> > +   if (WARN_ON_ONCE(objs->nents == objs->total))
> > +   return;
> > +
> > +   drm_gem_object_get(obj);
> > +   objs->objs[objs->nents] = obj;
> > +   objs->nents++;
> > +}
> > +
> > +int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs)
> > +{
> > +   return drm_gem_lock_reservations(objs->objs, objs->nents,
> > +&objs->ticket);
Unlike in other drivers where an "object array" is only needed in
execbuffer, we will use this in several places, and often with only 1
object in the array.  Can we special case that and do a quick
reservation_object_lock?

> > +}
> > +
> > +void virtio_gpu_array_

Re: [PATCH v6 15/18] drm/virtio: rework virtio_gpu_transfer_to_host_ioctl fencing

2019-07-03 Thread Chia-I Wu
On Tue, Jul 2, 2019 at 7:19 AM Gerd Hoffmann  wrote:
>
> Switch to the virtio_gpu_array_* helper workflow.
>
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h   |  2 +-
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 43 --
>  drivers/gpu/drm/virtio/virtgpu_vq.c|  5 ++-
>  3 files changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 4df760ba018e..b1f63a21abb6 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -308,10 +308,10 @@ void virtio_gpu_cmd_transfer_from_host_3d(struct 
> virtio_gpu_device *vgdev,
>   struct virtio_gpu_object_array 
> *objs,
>   struct virtio_gpu_fence *fence);
>  void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev,
> -   struct virtio_gpu_object *bo,
> uint32_t ctx_id,
> uint64_t offset, uint32_t level,
> struct virtio_gpu_box *box,
> +   struct virtio_gpu_object_array *objs,
> struct virtio_gpu_fence *fence);
>  void
>  virtio_gpu_cmd_resource_create_3d(struct virtio_gpu_device *vgdev,
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 56182abdbf36..b220918df6a1 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -341,47 +341,44 @@ static int virtio_gpu_transfer_to_host_ioctl(struct 
> drm_device *dev, void *data,
> struct virtio_gpu_device *vgdev = dev->dev_private;
> struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
> struct drm_virtgpu_3d_transfer_to_host *args = data;
> -   struct drm_gem_object *gobj = NULL;
> -   struct virtio_gpu_object *qobj = NULL;
> +   struct virtio_gpu_object_array *objs;
> struct virtio_gpu_fence *fence;
> struct virtio_gpu_box box;
> int ret;
> u32 offset = args->offset;
>
> -   gobj = drm_gem_object_lookup(file, args->bo_handle);
> -   if (gobj == NULL)
> +   objs = virtio_gpu_array_from_handles(file, &args->bo_handle, 1);
> +   if (objs == NULL)
> return -ENOENT;
>
> -   qobj = gem_to_virtio_gpu_obj(gobj);
> -
> -   ret = virtio_gpu_object_reserve(qobj);
> -   if (ret)
> -   goto out;
> -
> convert_to_hw_box(&box, &args->box);
> if (!vgdev->has_virgl_3d) {
> virtio_gpu_cmd_transfer_to_host_2d
> -   (vgdev, qobj, offset,
> +   (vgdev, gem_to_virtio_gpu_obj(objs->objs[0]), offset,
>  box.w, box.h, box.x, box.y, NULL);
> +   virtio_gpu_array_put_free(objs);
Don't we need this in non-3D case as well?
> } else {
> +   ret = virtio_gpu_array_lock_resv(objs);
> +   if (ret != 0)
> +   goto err_put_free;
> +
> +   ret = -ENOMEM;
> fence = virtio_gpu_fence_alloc(vgdev);
> -   if (!fence) {
> -   ret = -ENOMEM;
> -   goto out_unres;
> -   }
> +   if (!fence)
> +   goto err_unlock;
> +
> virtio_gpu_cmd_transfer_to_host_3d
> -   (vgdev, qobj,
> +   (vgdev,
>  vfpriv ? vfpriv->ctx_id : 0, offset,
> -args->level, &box, fence);
> -   reservation_object_add_excl_fence(qobj->base.base.resv,
> - &fence->f);
> +args->level, &box, objs, fence);
> dma_fence_put(&fence->f);
> }
> +   return 0;
>
> -out_unres:
> -   virtio_gpu_object_unreserve(qobj);
> -out:
> -   drm_gem_object_put_unlocked(gobj);
> +err_unlock:
> +   virtio_gpu_array_unlock_resv(objs);
> +err_put_free:
> +   virtio_gpu_array_put_free(objs);
> return ret;
>  }
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
> b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index bef7036f4508..1c0097de419a 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -899,12 +899,13 @@ virtio_gpu_cmd_resource_create_3d(struct 
> virtio_gpu_device *vgdev,
>  }
>
>  void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev,
> -   struct virtio_gpu_object *bo,
> uint32_t ctx_id,
> uint64_t offset, uint32_t level,
> struct virtio_gpu_box *box,
> + 

Re: [PATCH v6 14/18] drm/virtio: rework virtio_gpu_transfer_from_host_ioctl fencing

2019-07-03 Thread Chia-I Wu
On Tue, Jul 2, 2019 at 7:19 AM Gerd Hoffmann  wrote:
>
> Switch to the virtio_gpu_array_* helper workflow.
(just repeating my question on patch 6)

Does this fix the obj refcount issue?  When was the issue introduced?

>
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h   |  3 ++-
>  drivers/gpu/drm/virtio/virtgpu_ioctl.c | 35 +++---
>  drivers/gpu/drm/virtio/virtgpu_vq.c|  8 --
>  3 files changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 78dc5a19a358..4df760ba018e 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -302,9 +302,10 @@ void virtio_gpu_cmd_submit(struct virtio_gpu_device 
> *vgdev,
>struct virtio_gpu_object_array *objs,
>struct virtio_gpu_fence *fence);
>  void virtio_gpu_cmd_transfer_from_host_3d(struct virtio_gpu_device *vgdev,
> - uint32_t resource_id, uint32_t 
> ctx_id,
> + uint32_t ctx_id,
>   uint64_t offset, uint32_t level,
>   struct virtio_gpu_box *box,
> + struct virtio_gpu_object_array 
> *objs,
>   struct virtio_gpu_fence *fence);
>  void virtio_gpu_cmd_transfer_to_host_3d(struct virtio_gpu_device *vgdev,
> struct virtio_gpu_object *bo,
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 0d0acf0b85ed..56182abdbf36 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -298,8 +298,7 @@ static int virtio_gpu_transfer_from_host_ioctl(struct 
> drm_device *dev,
> struct virtio_gpu_device *vgdev = dev->dev_private;
> struct virtio_gpu_fpriv *vfpriv = file->driver_priv;
> struct drm_virtgpu_3d_transfer_from_host *args = data;
> -   struct drm_gem_object *gobj = NULL;
> -   struct virtio_gpu_object *qobj = NULL;
> +   struct virtio_gpu_object_array *objs;
> struct virtio_gpu_fence *fence;
> int ret;
> u32 offset = args->offset;
> @@ -308,35 +307,31 @@ static int virtio_gpu_transfer_from_host_ioctl(struct 
> drm_device *dev,
> if (vgdev->has_virgl_3d == false)
> return -ENOSYS;
>
> -   gobj = drm_gem_object_lookup(file, args->bo_handle);
> -   if (gobj == NULL)
> +   objs = virtio_gpu_array_from_handles(file, &args->bo_handle, 1);
> +   if (objs == NULL)
> return -ENOENT;
>
> -   qobj = gem_to_virtio_gpu_obj(gobj);
> -
> -   ret = virtio_gpu_object_reserve(qobj);
> -   if (ret)
> -   goto out;
> +   ret = virtio_gpu_array_lock_resv(objs);
> +   if (ret != 0)
> +   goto err_put_free;
>
> convert_to_hw_box(&box, &args->box);
>
> fence = virtio_gpu_fence_alloc(vgdev);
> if (!fence) {
> ret = -ENOMEM;
> -   goto out_unres;
> +   goto err_unlock;
> }
> virtio_gpu_cmd_transfer_from_host_3d
> -   (vgdev, qobj->hw_res_handle,
> -vfpriv->ctx_id, offset, args->level,
> -&box, fence);
> -   reservation_object_add_excl_fence(qobj->base.base.resv,
> - &fence->f);
> -
> +   (vgdev, vfpriv->ctx_id, offset, args->level,
> +&box, objs, fence);
> dma_fence_put(&fence->f);
> -out_unres:
> -   virtio_gpu_object_unreserve(qobj);
> -out:
> -   drm_gem_object_put_unlocked(gobj);
> +   return 0;
> +
> +err_unlock:
> +   virtio_gpu_array_unlock_resv(objs);
> +err_put_free:
> +   virtio_gpu_array_put_free(objs);
> return ret;
>  }
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c 
> b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index fc908d5cb217..bef7036f4508 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -928,20 +928,24 @@ void virtio_gpu_cmd_transfer_to_host_3d(struct 
> virtio_gpu_device *vgdev,
>  }
>
>  void virtio_gpu_cmd_transfer_from_host_3d(struct virtio_gpu_device *vgdev,
> - uint32_t resource_id, uint32_t 
> ctx_id,
> + uint32_t ctx_id,
>   uint64_t offset, uint32_t level,
>   struct virtio_gpu_box *box,
> + struct virtio_gpu_object_array 
> *objs,
>   struct virtio_gpu_fence *fence)
>  {
> +   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]);
> struct virtio_gpu_transfe

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Wed, Jul 03, 2019 at 12:31:57PM -0600, Alex Williamson wrote:
> On Wed,  3 Jul 2019 17:13:39 +0800
> Tiwei Bie  wrote:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 8f10748dac79..6c5718ab7eeb 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -201,6 +201,7 @@ struct vfio_device_info {
> >  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)   /* vfio-amba device */
> >  #define VFIO_DEVICE_FLAGS_CCW  (1 << 4)/* vfio-ccw device */
> >  #define VFIO_DEVICE_FLAGS_AP   (1 << 5)/* vfio-ap device */
> > +#define VFIO_DEVICE_FLAGS_VHOST(1 << 6)/* vfio-vhost device */
> > __u32   num_regions;/* Max region index + 1 */
> > __u32   num_irqs;   /* Max IRQ index + 1 */
> >  };
> > @@ -217,6 +218,7 @@ struct vfio_device_info {
> >  #define VFIO_DEVICE_API_AMBA_STRING"vfio-amba"
> >  #define VFIO_DEVICE_API_CCW_STRING "vfio-ccw"
> >  #define VFIO_DEVICE_API_AP_STRING  "vfio-ap"
> > +#define VFIO_DEVICE_API_VHOST_STRING   "vfio-vhost"
> >  
> >  /**
> >   * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
> > @@ -573,6 +575,23 @@ enum {
> > VFIO_CCW_NUM_IRQS
> >  };
> >  
> > +/*
> > + * The vfio-vhost bus driver makes use of the following fixed region and
> > + * IRQ index mapping. Unimplemented regions return a size of zero.
> > + * Unimplemented IRQ types return a count of zero.
> > + */
> > +
> > +enum {
> > +   VFIO_VHOST_CONFIG_REGION_INDEX,
> > +   VFIO_VHOST_NOTIFY_REGION_INDEX,
> > +   VFIO_VHOST_NUM_REGIONS
> > +};
> > +
> > +enum {
> > +   VFIO_VHOST_VQ_IRQ_INDEX,
> > +   VFIO_VHOST_NUM_IRQS
> > +};
> > +
> 
> Note that the vfio API has evolved a bit since vfio-pci started this
> way, with fixed indexes for pre-defined region types.  We now support
> device specific regions which can be identified by a capability within
> the REGION_INFO ioctl return data.  This allows a bit more flexibility,
> at the cost of complexity, but the infrastructure already exists in
> kernel and QEMU to make it relatively easy.  I think we'll have the
> same support for interrupts soon too.  If you continue to pursue the
> vfio-vhost direction you might want to consider these before committing
> to fixed indexes.  Thanks,

Thanks for the details! Will give it a try!

Thanks,
Tiwei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/3] vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock

2019-07-03 Thread Jason Wang


On 2019/7/3 下午6:41, Stefano Garzarella wrote:

On Wed, Jul 03, 2019 at 05:53:58PM +0800, Jason Wang wrote:

On 2019/6/28 下午8:36, Stefano Garzarella wrote:

Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.

To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.

We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.

We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.

Signed-off-by: Stefano Garzarella 
---
   net/vmw_vsock/virtio_transport.c | 67 +---
   1 file changed, 44 insertions(+), 23 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 9c287e3e393c..7ad510ec12e0 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -65,19 +65,22 @@ struct virtio_vsock {
u32 guest_cid;
   };
-static struct virtio_vsock *virtio_vsock_get(void)
-{
-   return the_virtio_vsock;
-}
-
   static u32 virtio_transport_get_local_cid(void)
   {
-   struct virtio_vsock *vsock = virtio_vsock_get();
+   struct virtio_vsock *vsock;
+   u32 ret;
-   if (!vsock)
-   return VMADDR_CID_ANY;
+   rcu_read_lock();
+   vsock = rcu_dereference(the_virtio_vsock);
+   if (!vsock) {
+   ret = VMADDR_CID_ANY;
+   goto out_rcu;
+   }
-   return vsock->guest_cid;
+   ret = vsock->guest_cid;
+out_rcu:
+   rcu_read_unlock();
+   return ret;
   }
   static void virtio_transport_loopback_work(struct work_struct *work)
@@ -197,14 +200,18 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
struct virtio_vsock *vsock;
int len = pkt->len;
-   vsock = virtio_vsock_get();
+   rcu_read_lock();
+   vsock = rcu_dereference(the_virtio_vsock);
if (!vsock) {
virtio_transport_free_pkt(pkt);
-   return -ENODEV;
+   len = -ENODEV;
+   goto out_rcu;
}
-   if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid)
-   return virtio_transport_send_pkt_loopback(vsock, pkt);
+   if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
+   len = virtio_transport_send_pkt_loopback(vsock, pkt);
+   goto out_rcu;
+   }
if (pkt->reply)
atomic_inc(&vsock->queued_replies);
@@ -214,6 +221,9 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
spin_unlock_bh(&vsock->send_pkt_list_lock);
queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
+
+out_rcu:
+   rcu_read_unlock();
return len;
   }
@@ -222,12 +232,14 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
   {
struct virtio_vsock *vsock;
struct virtio_vsock_pkt *pkt, *n;
-   int cnt = 0;
+   int cnt = 0, ret;
LIST_HEAD(freeme);
-   vsock = virtio_vsock_get();
+   rcu_read_lock();
+   vsock = rcu_dereference(the_virtio_vsock);
if (!vsock) {
-   return -ENODEV;
+   ret = -ENODEV;
+   goto out_rcu;
}
spin_lock_bh(&vsock->send_pkt_list_lock);
@@ -255,7 +267,11 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
queue_work(virtio_vsock_workqueue, &vsock->rx_work);
}
-   return 0;
+   ret = 0;
+
+out_rcu:
+   rcu_read_unlock();
+   return ret;
   }
   static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
@@ -590,8 +606,6 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
vsock->rx_buf_max_nr = 0;
atomic_set(&vsock->queued_replies, 0);
-   vdev->priv = vsock;
-   the_virtio_vsock = vsock;
mutex_init(&vsock->tx_lock);
mutex_init(&vsock->rx_lock);
mutex_init(&vsock->event_lock);
@@ -613,6 +627,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
virtio_vsock_event_fill(vsock);
mutex_unlock(&vsock->event_lock);
+   vdev->priv = vsock;
+   rcu_assign_pointer(the_virtio_vsock, vsock);


You probably need to use rcu_dereference_protected() to access
the_virtio_vsock in the function in order to survive from sparse.


Ooo, thanks!

Do you mean when we check if the_virtio_vsock is not null at the beginning of
virtio_vsock_probe()?



I mean instead of:

    /* Only one virtio-vsock device per guest is supported */
    if (the_virtio_vsock) {
        ret = -EBUSY;
        goto out;
  

Re: [PATCH v2 2/3] vsock/virtio: stop workers during the .remove()

2019-07-03 Thread Jason Wang


On 2019/6/28 下午8:36, Stefano Garzarella wrote:

Before to call vdev->config->reset(vdev) we need to be sure that
no one is accessing the device, for this reason, we add new variables
in the struct virtio_vsock to stop the workers during the .remove().

This patch also add few comments before vdev->config->reset(vdev)
and vdev->config->del_vqs(vdev).

Suggested-by: Stefan Hajnoczi 
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Stefano Garzarella 
---
  net/vmw_vsock/virtio_transport.c | 51 +++-
  1 file changed, 50 insertions(+), 1 deletion(-)



This should work. But we may consider to convert the_virtio_vosck to 
socket object and use socket refcnt and destructor in the future instead 
of inventing something new by ourselves.


Thanks




diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 7ad510ec12e0..1b44ec6f3f6c 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -38,6 +38,7 @@ struct virtio_vsock {
 * must be accessed with tx_lock held.
 */
struct mutex tx_lock;
+   bool tx_run;
  
  	struct work_struct send_pkt_work;

spinlock_t send_pkt_list_lock;
@@ -53,6 +54,7 @@ struct virtio_vsock {
 * must be accessed with rx_lock held.
 */
struct mutex rx_lock;
+   bool rx_run;
int rx_buf_nr;
int rx_buf_max_nr;
  
@@ -60,6 +62,7 @@ struct virtio_vsock {

 * vqs[VSOCK_VQ_EVENT] must be accessed with event_lock held.
 */
struct mutex event_lock;
+   bool event_run;
struct virtio_vsock_event event_list[8];
  
  	u32 guest_cid;

@@ -94,6 +97,10 @@ static void virtio_transport_loopback_work(struct 
work_struct *work)
spin_unlock_bh(&vsock->loopback_list_lock);
  
  	mutex_lock(&vsock->rx_lock);

+
+   if (!vsock->rx_run)
+   goto out;
+
while (!list_empty(&pkts)) {
struct virtio_vsock_pkt *pkt;
  
@@ -102,6 +109,7 @@ static void virtio_transport_loopback_work(struct work_struct *work)
  
  		virtio_transport_recv_pkt(pkt);

}
+out:
mutex_unlock(&vsock->rx_lock);
  }
  
@@ -130,6 +138,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
  
  	mutex_lock(&vsock->tx_lock);
  
+	if (!vsock->tx_run)

+   goto out;
+
vq = vsock->vqs[VSOCK_VQ_TX];
  
  	for (;;) {

@@ -188,6 +199,7 @@ virtio_transport_send_pkt_work(struct work_struct *work)
if (added)
virtqueue_kick(vq);
  
+out:

mutex_unlock(&vsock->tx_lock);
  
  	if (restart_rx)

@@ -323,6 +335,10 @@ static void virtio_transport_tx_work(struct work_struct 
*work)
  
  	vq = vsock->vqs[VSOCK_VQ_TX];

mutex_lock(&vsock->tx_lock);
+
+   if (!vsock->tx_run)
+   goto out;
+
do {
struct virtio_vsock_pkt *pkt;
unsigned int len;
@@ -333,6 +349,8 @@ static void virtio_transport_tx_work(struct work_struct 
*work)
added = true;
}
} while (!virtqueue_enable_cb(vq));
+
+out:
mutex_unlock(&vsock->tx_lock);
  
  	if (added)

@@ -361,6 +379,9 @@ static void virtio_transport_rx_work(struct work_struct 
*work)
  
  	mutex_lock(&vsock->rx_lock);
  
+	if (!vsock->rx_run)

+   goto out;
+
do {
virtqueue_disable_cb(vq);
for (;;) {
@@ -470,6 +491,9 @@ static void virtio_transport_event_work(struct work_struct 
*work)
  
  	mutex_lock(&vsock->event_lock);
  
+	if (!vsock->event_run)

+   goto out;
+
do {
struct virtio_vsock_event *event;
unsigned int len;
@@ -484,7 +508,7 @@ static void virtio_transport_event_work(struct work_struct 
*work)
} while (!virtqueue_enable_cb(vq));
  
  	virtqueue_kick(vsock->vqs[VSOCK_VQ_EVENT]);

-
+out:
mutex_unlock(&vsock->event_lock);
  }
  
@@ -619,12 +643,18 @@ static int virtio_vsock_probe(struct virtio_device *vdev)

INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
INIT_WORK(&vsock->loopback_work, virtio_transport_loopback_work);
  
+	mutex_lock(&vsock->tx_lock);

+   vsock->tx_run = true;
+   mutex_unlock(&vsock->tx_lock);
+
mutex_lock(&vsock->rx_lock);
virtio_vsock_rx_fill(vsock);
+   vsock->rx_run = true;
mutex_unlock(&vsock->rx_lock);
  
  	mutex_lock(&vsock->event_lock);

virtio_vsock_event_fill(vsock);
+   vsock->event_run = true;
mutex_unlock(&vsock->event_lock);
  
  	vdev->priv = vsock;

@@ -659,6 +689,24 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
/* Reset all connected sockets when the device disappear */
vsock_for_each_connected_socket(virtio_vsock_reset_sock);
  
+	/* Stop all work handlers to make sure no one is accessing the device,

+* so we can safely call vdev->config->reset().
+*/
+   mutex_lock(&vsock->rx_lock);
+  

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Jason Wang


On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..


I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?



Yes. Is there any other way btw?

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Tiwei Bie
On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:
> On 2019/7/3 下午9:08, Tiwei Bie wrote:
> > On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:
> > > On 2019/7/3 下午7:52, Tiwei Bie wrote:
> > > > On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:
> > > > > On 2019/7/3 下午5:13, Tiwei Bie wrote:
> > > > > > Details about this can be found here:
> > > > > > 
> > > > > > https://lwn.net/Articles/750770/
> > > > > > 
> > > > > > What's new in this version
> > > > > > ==
> > > > > > 
> > > > > > A new VFIO device type is introduced - vfio-vhost. This addressed
> > > > > > some comments from here:https://patchwork.ozlabs.org/cover/984763/
> > > > > > 
> > > > > > Below is the updated device interface:
> > > > > > 
> > > > > > Currently, there are two regions of this device: 1) CONFIG_REGION
> > > > > > (VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
> > > > > > device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
> > > > > > can be used to notify the device.
> > > > > > 
> > > > > > 1. CONFIG_REGION
> > > > > > 
> > > > > > The region described by CONFIG_REGION is the main control interface.
> > > > > > Messages will be written to or read from this region.
> > > > > > 
> > > > > > The message type is determined by the `request` field in message
> > > > > > header. The message size is encoded in the message header too.
> > > > > > The message format looks like this:
> > > > > > 
> > > > > > struct vhost_vfio_op {
> > > > > > __u64 request;
> > > > > > __u32 flags;
> > > > > > /* Flag values: */
> > > > > > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > > > > > __u32 size;
> > > > > > union {
> > > > > > __u64 u64;
> > > > > > struct vhost_vring_state state;
> > > > > > struct vhost_vring_addr addr;
> > > > > > } payload;
> > > > > > };
> > > > > > 
> > > > > > The existing vhost-kernel ioctl cmds are reused as the message
> > > > > > requests in above structure.
> > > > > Still a comments like V1. What's the advantage of inventing a new 
> > > > > protocol?
> > > > I'm trying to make it work in VFIO's way..
> > > > 
> > > > > I believe either of the following should be better:
> > > > > 
> > > > > - using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL 
> > > > > and
> > > > > extend it with e.g notify region. The advantages is that all exist 
> > > > > userspace
> > > > > program could be reused without modification (or minimal 
> > > > > modification). And
> > > > > vhost API hides lots of details that is not necessary to be 
> > > > > understood by
> > > > > application (e.g in the case of container).
> > > > Do you mean reusing vhost's ioctl on VFIO device fd directly,
> > > > or introducing another mdev driver (i.e. vhost_mdev instead of
> > > > using the existing vfio_mdev) for mdev device?
> > > Can we simply add them into ioctl of mdev_parent_ops?
> > Right, either way, these ioctls have to be and just need to be
> > added in the ioctl of the mdev_parent_ops. But another thing we
> > also need to consider is that which file descriptor the userspace
> > will do the ioctl() on. So I'm wondering do you mean let the
> > userspace do the ioctl() on the VFIO device fd of the mdev
> > device?
> > 
> 
> Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.

> Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea. I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?

Thanks,
Tiwei
> 
> Thanks
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2] vhost: introduce mdev based hardware vhost backend

2019-07-03 Thread Jason Wang


On 2019/7/4 下午2:21, Tiwei Bie wrote:

On Thu, Jul 04, 2019 at 12:31:48PM +0800, Jason Wang wrote:

On 2019/7/3 下午9:08, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 08:16:23PM +0800, Jason Wang wrote:

On 2019/7/3 下午7:52, Tiwei Bie wrote:

On Wed, Jul 03, 2019 at 06:09:51PM +0800, Jason Wang wrote:

On 2019/7/3 下午5:13, Tiwei Bie wrote:

Details about this can be found here:

https://lwn.net/Articles/750770/

What's new in this version
==

A new VFIO device type is introduced - vfio-vhost. This addressed
some comments from here:https://patchwork.ozlabs.org/cover/984763/

Below is the updated device interface:

Currently, there are two regions of this device: 1) CONFIG_REGION
(VFIO_VHOST_CONFIG_REGION_INDEX), which can be used to setup the
device; 2) NOTIFY_REGION (VFIO_VHOST_NOTIFY_REGION_INDEX), which
can be used to notify the device.

1. CONFIG_REGION

The region described by CONFIG_REGION is the main control interface.
Messages will be written to or read from this region.

The message type is determined by the `request` field in message
header. The message size is encoded in the message header too.
The message format looks like this:

struct vhost_vfio_op {
__u64 request;
__u32 flags;
/* Flag values: */
 #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
__u32 size;
union {
__u64 u64;
struct vhost_vring_state state;
struct vhost_vring_addr addr;
} payload;
};

The existing vhost-kernel ioctl cmds are reused as the message
requests in above structure.

Still a comments like V1. What's the advantage of inventing a new protocol?

I'm trying to make it work in VFIO's way..


I believe either of the following should be better:

- using vhost ioctl,  we can start from SET_VRING_KICK/SET_VRING_CALL and
extend it with e.g notify region. The advantages is that all exist userspace
program could be reused without modification (or minimal modification). And
vhost API hides lots of details that is not necessary to be understood by
application (e.g in the case of container).

Do you mean reusing vhost's ioctl on VFIO device fd directly,
or introducing another mdev driver (i.e. vhost_mdev instead of
using the existing vfio_mdev) for mdev device?

Can we simply add them into ioctl of mdev_parent_ops?

Right, either way, these ioctls have to be and just need to be
added in the ioctl of the mdev_parent_ops. But another thing we
also need to consider is that which file descriptor the userspace
will do the ioctl() on. So I'm wondering do you mean let the
userspace do the ioctl() on the VFIO device fd of the mdev
device?


Yes.

Got it! I'm not sure what's Alex opinion on this. If we all
agree with this, I can do it in this way.


Is there any other way btw?

Just a quick thought.. Maybe totally a bad idea.



It's not for sure :)



  I was thinking
whether it would be odd to do non-VFIO's ioctls on VFIO's device
fd. So I was wondering whether it's possible to allow binding
another mdev driver (e.g. vhost_mdev) to the supported mdev
devices. The new mdev driver, vhost_mdev, can provide similar
ways to let userspace open the mdev device and do the vhost ioctls
on it. To distinguish with the vfio_mdev compatible mdev devices,
the device API of the new vhost_mdev compatible mdev devices
might be e.g. "vhost-net" for net?

So in VFIO case, the device will be for passthru directly. And
in VHOST case, the device can be used to accelerate the existing
virtualized devices.

How do you think?



If my understanding is correct, there will be no VFIO ioctl if we go for 
vhost_mdev?


Thanks




Thanks,
Tiwei

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization