Re: [Qemu-devel] [RFC v2 11/12] Add virtio-gpu vhost-user backend

2018-06-10 Thread Gerd Hoffmann
> >> For now, a socketpair is created for the backend to share the rendering
> >> results with qemu via a simple VHOST_GPU protocol.
> >
> > Why this isn't a separate device, like vhost-user-input-pci?
> 
> Ok, let's have vhost-user-gpu-pci and vhost-user-vga, inheriting from
> existing devices.

I'd tend to create separate devices instead of inheriting from the
existing devices.  Arn't the code paths more or less completely
different?  What code is shared between builtin and vhost-user versions
of the devices?

> >> +typedef struct VhostGpuUpdate {
> >> +uint32_t scanout_id;
> >> +uint32_t x;
> >> +uint32_t y;
> >> +uint32_t width;
> >> +uint32_t height;
> >> +uint8_t data[];
> >> +} QEMU_PACKED VhostGpuUpdate;
> >
> > Hmm, when designing a new protocol I think we can do better than just
> > squeering the pixels into a tcp stream.  Use shared memory instead?  Due
> > to vhost we are limited to linux anyway, so we might even consider stuff
> > like dmabufs here.
> 
> Well, my goal is not to invent a new spice or wayland protocol :) I
> don't care much about 2d performance at this point, more about 3d. Can
> we leave 2d improvements for another day? Beside, what would dmabuf
> bring us for 2d compared to shmem?

Well, you need dma-bufs for 3d anyway, so why not use them for 2d too?
I don't think we need separate code paths for 2d vs. 3d updates.

> There seems to be a lot of overhead with the roundtrip vhost-user ->
> qemu -> spice worker -> spice client -> wayland/x11 -> gpu already
> (but this isn't necessarily so bad at 60fps or less).
> Ideally, I would like to bypass qemu & spice for local rendering, but
> I don't think wayland support that kind of nested window composition
> (at least tracking messages weston --nested doesn't show that kind of
> optimization).

Yep, a direct vhost-user -> wayland path makes sense.  Using dma-bufs
for both 2d and 3d should simplify that too (again: one code path
instead of two).

What do you mean with nested window composition?

cheers,
  Gerd




Re: [Qemu-devel] [RFC v2 11/12] Add virtio-gpu vhost-user backend

2018-06-08 Thread Marc-André Lureau
On Fri, Jun 8, 2018 at 7:25 PM, Marc-André Lureau
 wrote:
> Hi
>
> On Mon, Jun 4, 2018 at 11:37 AM, Gerd Hoffmann  wrote:
>> On Fri, Jun 01, 2018 at 06:27:48PM +0200, Marc-André Lureau wrote:
>>> Add to virtio-gpu devices a "vhost-user" property. When set, the
>>> associated vhost-user backend is used to handle the virtio rings.
>>>
>>> For now, a socketpair is created for the backend to share the rendering
>>> results with qemu via a simple VHOST_GPU protocol.
>>
>> Why this isn't a separate device, like vhost-user-input-pci?
>
> Ok, let's have vhost-user-gpu-pci and vhost-user-vga, inheriting from
> existing devices.
>
>>> +typedef struct VhostGpuUpdate {
>>> +uint32_t scanout_id;
>>> +uint32_t x;
>>> +uint32_t y;
>>> +uint32_t width;
>>> +uint32_t height;
>>> +uint8_t data[];
>>> +} QEMU_PACKED VhostGpuUpdate;
>>
>> Hmm, when designing a new protocol I think we can do better than just
>> squeering the pixels into a tcp stream.  Use shared memory instead?  Due
>> to vhost we are limited to linux anyway, so we might even consider stuff
>> like dmabufs here.
>
> Well, my goal is not to invent a new spice or wayland protocol :) I
> don't care much about 2d performance at this point, more about 3d. Can
> we leave 2d improvements for another day? Beside, what would dmabuf
> bring us for 2d compared to shmem?
>
> There seems to be a lot of overhead with the roundtrip vhost-user ->
> qemu -> spice worker -> spice client -> wayland/x11 -> gpu already
> (but this isn't necessarily so bad at 60fps or less).
> Ideally, I would like to bypass qemu & spice for local rendering, but
> I don't think wayland support that kind of nested window composition
> (at least tracking messages weston --nested doesn't show that kind of
> optimization).
>
> FWIW, here are some Unigine Heaven 4.0 benchmarks (probably within +-10%):
>
> qemu-gtk/egl+virtio-gpu: fps:2.6/ score: 64
> qemu-gtk/egl+vhost-user-gpu: fps:12.9 / score: 329
>
> spice+virtio-gpu: fps:2.8 / score: 70
> spice+vhost-user-gpu: fps:12.1 / score: 304
>
> That should give an extra motivation :)
>

(host is fps:31.1 / score:784



-- 
Marc-André Lureau



Re: [Qemu-devel] [RFC v2 11/12] Add virtio-gpu vhost-user backend

2018-06-08 Thread Marc-André Lureau
Hi

On Mon, Jun 4, 2018 at 11:37 AM, Gerd Hoffmann  wrote:
> On Fri, Jun 01, 2018 at 06:27:48PM +0200, Marc-André Lureau wrote:
>> Add to virtio-gpu devices a "vhost-user" property. When set, the
>> associated vhost-user backend is used to handle the virtio rings.
>>
>> For now, a socketpair is created for the backend to share the rendering
>> results with qemu via a simple VHOST_GPU protocol.
>
> Why this isn't a separate device, like vhost-user-input-pci?

Ok, let's have vhost-user-gpu-pci and vhost-user-vga, inheriting from
existing devices.

>> +typedef struct VhostGpuUpdate {
>> +uint32_t scanout_id;
>> +uint32_t x;
>> +uint32_t y;
>> +uint32_t width;
>> +uint32_t height;
>> +uint8_t data[];
>> +} QEMU_PACKED VhostGpuUpdate;
>
> Hmm, when designing a new protocol I think we can do better than just
> squeering the pixels into a tcp stream.  Use shared memory instead?  Due
> to vhost we are limited to linux anyway, so we might even consider stuff
> like dmabufs here.

Well, my goal is not to invent a new spice or wayland protocol :) I
don't care much about 2d performance at this point, more about 3d. Can
we leave 2d improvements for another day? Beside, what would dmabuf
bring us for 2d compared to shmem?

There seems to be a lot of overhead with the roundtrip vhost-user ->
qemu -> spice worker -> spice client -> wayland/x11 -> gpu already
(but this isn't necessarily so bad at 60fps or less).
Ideally, I would like to bypass qemu & spice for local rendering, but
I don't think wayland support that kind of nested window composition
(at least tracking messages weston --nested doesn't show that kind of
optimization).

FWIW, here are some Unigine Heaven 4.0 benchmarks (probably within +-10%):

qemu-gtk/egl+virtio-gpu: fps:2.6/ score: 64
qemu-gtk/egl+vhost-user-gpu: fps:12.9 / score: 329

spice+virtio-gpu: fps:2.8 / score: 70
spice+vhost-user-gpu: fps:12.1 / score: 304

That should give an extra motivation :)

-- 
Marc-André Lureau



Re: [Qemu-devel] [RFC v2 11/12] Add virtio-gpu vhost-user backend

2018-06-04 Thread Gerd Hoffmann
On Fri, Jun 01, 2018 at 06:27:48PM +0200, Marc-André Lureau wrote:
> Add to virtio-gpu devices a "vhost-user" property. When set, the
> associated vhost-user backend is used to handle the virtio rings.
> 
> For now, a socketpair is created for the backend to share the rendering
> results with qemu via a simple VHOST_GPU protocol.

Why this isn't a separate device, like vhost-user-input-pci?

> +typedef struct VhostGpuUpdate {
> +uint32_t scanout_id;
> +uint32_t x;
> +uint32_t y;
> +uint32_t width;
> +uint32_t height;
> +uint8_t data[];
> +} QEMU_PACKED VhostGpuUpdate;

Hmm, when designing a new protocol I think we can do better than just
squeering the pixels into a tcp stream.  Use shared memory instead?  Due
to vhost we are limited to linux anyway, so we might even consider stuff
like dmabufs here.

cheers,
  Gerd




[Qemu-devel] [RFC v2 11/12] Add virtio-gpu vhost-user backend

2018-06-01 Thread Marc-André Lureau
Add to virtio-gpu devices a "vhost-user" property. When set, the
associated vhost-user backend is used to handle the virtio rings.

For now, a socketpair is created for the backend to share the rendering
results with qemu via a simple VHOST_GPU protocol.

Example usage:
-object vhost-user-backend,id=vug,cmd="./vhost-user-gpu"
-device virtio-vga,virgl=true,vhost-user=vug

Signed-off-by: Marc-André Lureau 
---
 include/hw/virtio/virtio-gpu.h |   9 +
 include/ui/console.h   |   1 +
 hw/display/vhost-gpu.c | 290 +
 hw/display/virtio-gpu-3d.c |   8 +-
 hw/display/virtio-gpu-pci.c|   5 +
 hw/display/virtio-gpu.c|  77 -
 hw/display/virtio-vga.c|   5 +
 ui/spice-display.c |   3 +-
 hw/display/Makefile.objs   |   2 +-
 9 files changed, 393 insertions(+), 7 deletions(-)
 create mode 100644 hw/display/vhost-gpu.c

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 79bb3fb3dd..7cd514175a 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -19,6 +19,7 @@
 #include "ui/console.h"
 #include "hw/virtio/virtio.h"
 #include "qemu/log.h"
+#include "sysemu/vhost-user-backend.h"
 
 #include "standard-headers/linux/virtio_gpu.h"
 #define TYPE_VIRTIO_GPU "virtio-gpu-device"
@@ -88,6 +89,9 @@ struct virtio_gpu_ctrl_command {
 typedef struct VirtIOGPU {
 VirtIODevice parent_obj;
 
+VhostUserBackend *vhost;
+CharBackend vhost_chr;
+
 QEMUBH *ctrl_bh;
 QEMUBH *cursor_bh;
 VirtQueue *ctrl_vq;
@@ -103,6 +107,7 @@ typedef struct VirtIOGPU {
 QTAILQ_HEAD(, virtio_gpu_ctrl_command) fenceq;
 
 struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
+QemuDmaBuf dmabuf[VIRTIO_GPU_MAX_SCANOUTS];
 struct virtio_gpu_requested_state req_state[VIRTIO_GPU_MAX_SCANOUTS];
 
 struct virtio_gpu_conf conf;
@@ -171,4 +176,8 @@ void virtio_gpu_virgl_reset(VirtIOGPU *g);
 void virtio_gpu_gl_block(void *opaque, bool block);
 int virtio_gpu_virgl_init(VirtIOGPU *g);
 int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g);
+
+/* vhost-gpu.c */
+int vhost_gpu_init(VirtIOGPU *g, Error **errp);
+
 #endif
diff --git a/include/ui/console.h b/include/ui/console.h
index 981b519dde..fb969caf70 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -186,6 +186,7 @@ struct QemuDmaBuf {
 uint32_t  stride;
 uint32_t  fourcc;
 uint32_t  texture;
+bool  y0_top;
 };
 
 typedef struct DisplayChangeListenerOps {
diff --git a/hw/display/vhost-gpu.c b/hw/display/vhost-gpu.c
new file mode 100644
index 00..42d9143d3d
--- /dev/null
+++ b/hw/display/vhost-gpu.c
@@ -0,0 +1,290 @@
+/*
+ * Virtio vhost GPU Device
+ *
+ * Copyright Red Hat, Inc. 2016
+ *
+ * Authors:
+ * Marc-André Lureau 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/virtio/virtio-gpu.h"
+#include "chardev/char-fe.h"
+#include "qapi/error.h"
+
+typedef enum VhostGpuRequest {
+VHOST_GPU_NONE = 0,
+VHOST_GPU_CURSOR_POS,
+VHOST_GPU_CURSOR_POS_HIDE,
+VHOST_GPU_CURSOR_UPDATE,
+VHOST_GPU_SCANOUT,
+VHOST_GPU_UPDATE,
+VHOST_GPU_GL_SCANOUT,
+VHOST_GPU_GL_UPDATE,
+} VhostGpuRequest;
+
+typedef struct VhostGpuCursorPos {
+uint32_t scanout_id;
+uint32_t x;
+uint32_t y;
+} QEMU_PACKED VhostGpuCursorPos;
+
+typedef struct VhostGpuCursorUpdate {
+VhostGpuCursorPos pos;
+uint32_t hot_x;
+uint32_t hot_y;
+uint32_t data[64 * 64];
+} QEMU_PACKED VhostGpuCursorUpdate;
+
+typedef struct VhostGpuScanout {
+uint32_t scanout_id;
+uint32_t width;
+uint32_t height;
+} QEMU_PACKED VhostGpuScanout;
+
+typedef struct VhostGpuGlScanout {
+uint32_t scanout_id;
+uint32_t x;
+uint32_t y;
+uint32_t width;
+uint32_t height;
+uint32_t fd_width;
+uint32_t fd_height;
+uint32_t fd_stride;
+uint32_t fd_flags;
+int fd_drm_fourcc;
+} QEMU_PACKED VhostGpuGlScanout;
+
+typedef struct VhostGpuUpdate {
+uint32_t scanout_id;
+uint32_t x;
+uint32_t y;
+uint32_t width;
+uint32_t height;
+uint8_t data[];
+} QEMU_PACKED VhostGpuUpdate;
+
+typedef struct VhostGpuMsg {
+VhostGpuRequest request;
+uint32_t size; /* the following payload size */
+union {
+VhostGpuCursorPos cursor_pos;
+VhostGpuCursorUpdate cursor_update;
+VhostGpuScanout scanout;
+VhostGpuUpdate update;
+VhostGpuGlScanout gl_scanout;
+} payload;
+} QEMU_PACKED VhostGpuMsg;
+
+static VhostGpuMsg m __attribute__ ((unused));
+#define VHOST_GPU_HDR_SIZE (sizeof(m.request) + sizeof(m.size))
+
+static void vhost_gpu_handle_cursor(VirtIOGPU *g, VhostGpuMsg *msg)
+{
+VhostGpuCursorPos *pos = &msg->payload.cursor_pos;
+struct virtio_gpu_scanout *s;
+
+if (pos->scanout_id >= g->conf.max_outputs) {
+return;
+}
+s = &g->scanout