Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render

2024-07-08 Thread Kim, Dongwon

On 7/2/2024 11:26 PM, Marc-André Lureau wrote:

Hi

On Tue, Jul 2, 2024 at 10:11 PM Kim, Dongwon > wrote:


Hi Marc-André,

On 7/2/2024 3:29 AM, Marc-André Lureau wrote:
 > Hi
 >
 > On Fri, Jun 21, 2024 at 3:20 AM mailto:dongwon@intel.com>
 > >> wrote:
 >
 >     From: Dongwon Kim mailto:dongwon@intel.com> >>
 >
 >     Introducing new virtio-gpu param, 'render_sync' when guest
scanout blob
 >     is used (blob=true). The new param is used to specify when to
start
 >     rendering a guest scanout frame.
 >
 >     By default (and so far) rendering of the guest frame is
started in
 >     the draw event to make sure guest display update is
sychronized with
 >     host's vsync. But this method inevitably brings some extra
wait because
 >     most of time, the draw event is not happening right after the
guest
 >     scanout frame is flushed.
 >
 >
 >     This delay often makes the guest stuck at certain frame for
too long and
 >     causes general performance degradation of graphic workloads
on the
 >     guest's
 >     side especially when the display update rate is high. This
unwanted perf
 >     drop can be reduced if the guest scanout frame is rendered as
soon
 >     as it is
 >     flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl
display
 >     pipeline can be unblocked a lot earlier in this case so that
guest can
 >     move to the next display frame right away.
 >
 >     However, this "asynchrounous" render mode may cause some waste of
 >     resources
 >     as the guest could produce more frames than what are actually
displayed
 >     on the host display. This is similar as running rendering
apps with
 >     no vblank
 >     or vsync option. This is why this feature should stay as
optional.
 >
 >
 > Indeed, I don't see much point in doing so, it's pure waste. If
you want
 > to benchmark something perhaps. But then, why not simply run a
benchmark
 > offscreen?

Benchmark score is not the primary reason. The problem we want to avoid
is the laggy display update due to too many asynchronous plane updates
happening in the guest in certain situations like when moving SW mouse
cursor rigorously on the guest. This is because the fence (as well as
response for the resource-flush cmd) is signaled in the draw event.


Presumably, you won't get more frames displayed (perhaps even less due 
to extra load), so why is it laggy? Is it because the guest is doing too 
much buffering? 


Yes, I believe so. Virtio-gpu driver can't issue another resource flush 
as the previous frame is still on hold by the host. But there are many 
plane-update requests due to frame buffer updates because of the 
movement of SW mouse cursor in between. BTW, we currently use dirtyFB 
update on the guest running Xorg. Maybe using pageflip would make things 
better but we haven't tried it yet.


Because the mouse event queue isn't being drained

between scanouts?






 >
 >
 >     The param 'render_sync' is set to 'true' by default and this
is in line
 >     with traditional way while setting it to 'false' is basically
enabling
 >     this asynchronouse mode.
 >
 >
 > As it stands now, the option should actually be on the display
backend
 > (gtk/gtk-egl), it's not implemented for other backends.

We can help to deploy this concept to other backends if needed.


Why not make it a gtk display option instead?


That is possible but I made it virtio-gpu option as this is specific to 
virtio-gpu backend. We can consider moving it to gtk layer if it makes 
more sense.





 >
 > I am not convinced this is generally useful to be an extra option
though.

I initially thought this should be the default because the virtio-gpu
guest should not be hold by the host for too long in any cases. And
this
new approach is comparable to the default mode with blob=false where
the
guest is released as soon as the resource flush is done. Only
difference
is the resource synchronization using the fence.


virgl should be blocking rendering too

Could you detail your testing setup ?


We use ubuntu linux as a guest OS but it's modified for our GFX SRIOV
setup. I am not sure if this is the details you are looking for but here 
is it: we virtualize our GPU using SRIOV. So individual linux guest have 
their dedicated portion of GPU device that is detected and worked as 
"render-only" device. And since it is render-only, we pair it with 
virtio-gpu device/driver that provides display function. Scanouts on the 
guest are allocated by virtio-gpu then imported by M

Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render

2024-07-05 Thread Kim, Dongwon

On 7/2/2024 11:26 PM, Marc-André Lureau wrote:

Hi

On Tue, Jul 2, 2024 at 10:11 PM Kim, Dongwon > wrote:


Hi Marc-André,

On 7/2/2024 3:29 AM, Marc-André Lureau wrote:
 > Hi
 >
 > On Fri, Jun 21, 2024 at 3:20 AM mailto:dongwon@intel.com>
 > >> wrote:
 >
 >     From: Dongwon Kim mailto:dongwon@intel.com> >>
 >
 >     Introducing new virtio-gpu param, 'render_sync' when guest
scanout blob
 >     is used (blob=true). The new param is used to specify when to
start
 >     rendering a guest scanout frame.
 >
 >     By default (and so far) rendering of the guest frame is
started in
 >     the draw event to make sure guest display update is
sychronized with
 >     host's vsync. But this method inevitably brings some extra
wait because
 >     most of time, the draw event is not happening right after the
guest
 >     scanout frame is flushed.
 >
 >
 >     This delay often makes the guest stuck at certain frame for
too long and
 >     causes general performance degradation of graphic workloads
on the
 >     guest's
 >     side especially when the display update rate is high. This
unwanted perf
 >     drop can be reduced if the guest scanout frame is rendered as
soon
 >     as it is
 >     flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl
display
 >     pipeline can be unblocked a lot earlier in this case so that
guest can
 >     move to the next display frame right away.
 >
 >     However, this "asynchrounous" render mode may cause some waste of
 >     resources
 >     as the guest could produce more frames than what are actually
displayed
 >     on the host display. This is similar as running rendering
apps with
 >     no vblank
 >     or vsync option. This is why this feature should stay as
optional.
 >
 >
 > Indeed, I don't see much point in doing so, it's pure waste. If
you want
 > to benchmark something perhaps. But then, why not simply run a
benchmark
 > offscreen?

Benchmark score is not the primary reason. The problem we want to avoid
is the laggy display update due to too many asynchronous plane updates
happening in the guest in certain situations like when moving SW mouse
cursor rigorously on the guest. This is because the fence (as well as
response for the resource-flush cmd) is signaled in the draw event.


Presumably, you won't get more frames displayed (perhaps even less due 
to extra load), so why is it laggy? Is it because the guest is doing too 
much buffering? 


Yes, I believe so. Virtio-gpu driver can't issue another resource flush 
as the previous frame is still on hold by the host. But there are many 
plane-update requests due to frame buffer updates because of the 
movement of SW mouse cursor in between. BTW, we currently use dirtyFB 
update on the guest running Xorg. Maybe using pageflip would make things 
better but we haven't tried it yet.


Because the mouse event queue isn't being drained

between scanouts?






 >
 >
 >     The param 'render_sync' is set to 'true' by default and this
is in line
 >     with traditional way while setting it to 'false' is basically
enabling
 >     this asynchronouse mode.
 >
 >
 > As it stands now, the option should actually be on the display
backend
 > (gtk/gtk-egl), it's not implemented for other backends.

We can help to deploy this concept to other backends if needed.


Why not make it a gtk display option instead?


That is possible but I made it virtio-gpu option as this is specific to 
virtio-gpu backend. We can consider moving it to gtk layer if it makes 
more sense.





 >
 > I am not convinced this is generally useful to be an extra option
though.

I initially thought this should be the default because the virtio-gpu
guest should not be hold by the host for too long in any cases. And
this
new approach is comparable to the default mode with blob=false where
the
guest is released as soon as the resource flush is done. Only
difference
is the resource synchronization using the fence.


virgl should be blocking rendering too

Could you detail your testing setup ?


We use ubuntu linux as a guest OS but it's modified for our GFX SRIOV
setup. I am not sure if this is the details you are looking for but here 
is it: we virtualize our GPU using SRIOV. So individual linux guest have 
their dedicated portion of GPU device that is detected and worked as 
"render-only" device. And since it is render-only, we pair it with 
virtio-gpu device/driver that provides display feature. Scanouts on the 
guest are allocated by virtio-gpu then imported by Me

Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render

2024-07-02 Thread Marc-André Lureau
Hi

On Tue, Jul 2, 2024 at 10:11 PM Kim, Dongwon  wrote:

> Hi Marc-André,
>
> On 7/2/2024 3:29 AM, Marc-André Lureau wrote:
> > Hi
> >
> > On Fri, Jun 21, 2024 at 3:20 AM  > > wrote:
> >
> > From: Dongwon Kim  dongwon@intel.com>>
> >
> > Introducing new virtio-gpu param, 'render_sync' when guest scanout
> blob
> > is used (blob=true). The new param is used to specify when to start
> > rendering a guest scanout frame.
> >
> > By default (and so far) rendering of the guest frame is started in
> > the draw event to make sure guest display update is sychronized with
> > host's vsync. But this method inevitably brings some extra wait
> because
> > most of time, the draw event is not happening right after the guest
> > scanout frame is flushed.
> >
> >
> > This delay often makes the guest stuck at certain frame for too long
> and
> > causes general performance degradation of graphic workloads on the
> > guest's
> > side especially when the display update rate is high. This unwanted
> perf
> > drop can be reduced if the guest scanout frame is rendered as soon
> > as it is
> > flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
> > pipeline can be unblocked a lot earlier in this case so that guest
> can
> > move to the next display frame right away.
> >
> > However, this "asynchrounous" render mode may cause some waste of
> > resources
> > as the guest could produce more frames than what are actually
> displayed
> > on the host display. This is similar as running rendering apps with
> > no vblank
> > or vsync option. This is why this feature should stay as optional.
> >
> >
> > Indeed, I don't see much point in doing so, it's pure waste. If you want
> > to benchmark something perhaps. But then, why not simply run a benchmark
> > offscreen?
>
> Benchmark score is not the primary reason. The problem we want to avoid
> is the laggy display update due to too many asynchronous plane updates
> happening in the guest in certain situations like when moving SW mouse
> cursor rigorously on the guest. This is because the fence (as well as
> response for the resource-flush cmd) is signaled in the draw event.
>
>
Presumably, you won't get more frames displayed (perhaps even less due to
extra load), so why is it laggy? Is it because the guest is doing too much
buffering? Because the mouse event queue isn't being drained between
scanouts?


> >
> >
> > The param 'render_sync' is set to 'true' by default and this is in
> line
> > with traditional way while setting it to 'false' is basically
> enabling
> > this asynchronouse mode.
> >
> >
> > As it stands now, the option should actually be on the display backend
> > (gtk/gtk-egl), it's not implemented for other backends.
>
> We can help to deploy this concept to other backends if needed.
>

Why not make it a gtk display option instead?


> >
> > I am not convinced this is generally useful to be an extra option though.
>
> I initially thought this should be the default because the virtio-gpu
> guest should not be hold by the host for too long in any cases. And this
> new approach is comparable to the default mode with blob=false where the
> guest is released as soon as the resource flush is done. Only difference
> is the resource synchronization using the fence.
>

virgl should be blocking rendering too

Could you detail your testing setup ?

thanks


> >
> > Dongwon Kim (4):
> >hw/display/virtio-gpu: Introducing render_sync param
> >ui/egl-helpers: Consolidates create-sync and create-fence
> >ui/gtk-egl: Start rendering of guest blob scanout if render_sync
> is
> >  off
> >ui/gtk-gl-draw: Start rendering of guest blob scanout if
> render_sync
> >  is off
> >
> >   include/hw/virtio/virtio-gpu.h  |  3 ++
> >   include/ui/dmabuf.h |  4 +-
> >   include/ui/egl-helpers.h|  3 +-
> >   hw/display/vhost-user-gpu.c |  3 +-
> >   hw/display/virtio-gpu-udmabuf.c |  3 +-
> >   hw/display/virtio-gpu.c |  2 +
> >   hw/vfio/display.c   |  3 +-
> >   ui/dbus-listener.c  |  2 +-
> >   ui/dmabuf.c | 11 +++-
> >   ui/egl-helpers.c| 27 --
> >   ui/gtk-egl.c| 93
> ++---
> >   ui/gtk-gl-area.c| 90
> +++
> >   12 files changed, 146 insertions(+), 98 deletions(-)
> >
> > --
> > 2.34.1
> >
> >
> >
> >
> > --
> > Marc-André Lureau
>
>

-- 
Marc-André Lureau


Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render

2024-07-02 Thread Kim, Dongwon

Hi Marc-André,

On 7/2/2024 3:29 AM, Marc-André Lureau wrote:

Hi

On Fri, Jun 21, 2024 at 3:20 AM > wrote:


From: Dongwon Kim mailto:dongwon@intel.com>>

Introducing new virtio-gpu param, 'render_sync' when guest scanout blob
is used (blob=true). The new param is used to specify when to start
rendering a guest scanout frame.

By default (and so far) rendering of the guest frame is started in
the draw event to make sure guest display update is sychronized with
host's vsync. But this method inevitably brings some extra wait because
most of time, the draw event is not happening right after the guest
scanout frame is flushed.


This delay often makes the guest stuck at certain frame for too long and
causes general performance degradation of graphic workloads on the
guest's
side especially when the display update rate is high. This unwanted perf
drop can be reduced if the guest scanout frame is rendered as soon
as it is
flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
pipeline can be unblocked a lot earlier in this case so that guest can
move to the next display frame right away.

However, this "asynchrounous" render mode may cause some waste of
resources
as the guest could produce more frames than what are actually displayed
on the host display. This is similar as running rendering apps with
no vblank
or vsync option. This is why this feature should stay as optional.


Indeed, I don't see much point in doing so, it's pure waste. If you want 
to benchmark something perhaps. But then, why not simply run a benchmark 
offscreen?


Benchmark score is not the primary reason. The problem we want to avoid 
is the laggy display update due to too many asynchronous plane updates 
happening in the guest in certain situations like when moving SW mouse 
cursor rigorously on the guest. This is because the fence (as well as 
response for the resource-flush cmd) is signaled in the draw event.





The param 'render_sync' is set to 'true' by default and this is in line
with traditional way while setting it to 'false' is basically enabling
this asynchronouse mode.


As it stands now, the option should actually be on the display backend 
(gtk/gtk-egl), it's not implemented for other backends.


We can help to deploy this concept to other backends if needed.



I am not convinced this is generally useful to be an extra option though.


I initially thought this should be the default because the virtio-gpu 
guest should not be hold by the host for too long in any cases. And this 
new approach is comparable to the default mode with blob=false where the 
guest is released as soon as the resource flush is done. Only difference 
is the resource synchronization using the fence.




Dongwon Kim (4):
   hw/display/virtio-gpu: Introducing render_sync param
   ui/egl-helpers: Consolidates create-sync and create-fence
   ui/gtk-egl: Start rendering of guest blob scanout if render_sync is
     off
   ui/gtk-gl-draw: Start rendering of guest blob scanout if render_sync
     is off

  include/hw/virtio/virtio-gpu.h  |  3 ++
  include/ui/dmabuf.h             |  4 +-
  include/ui/egl-helpers.h        |  3 +-
  hw/display/vhost-user-gpu.c     |  3 +-
  hw/display/virtio-gpu-udmabuf.c |  3 +-
  hw/display/virtio-gpu.c         |  2 +
  hw/vfio/display.c               |  3 +-
  ui/dbus-listener.c              |  2 +-
  ui/dmabuf.c                     | 11 +++-
  ui/egl-helpers.c                | 27 --
  ui/gtk-egl.c                    | 93 ++---
  ui/gtk-gl-area.c                | 90 +++
  12 files changed, 146 insertions(+), 98 deletions(-)

-- 
2.34.1





--
Marc-André Lureau





Re: [RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render

2024-07-02 Thread Marc-André Lureau
Hi

On Fri, Jun 21, 2024 at 3:20 AM  wrote:

> From: Dongwon Kim 
>
> Introducing new virtio-gpu param, 'render_sync' when guest scanout blob
> is used (blob=true). The new param is used to specify when to start
> rendering a guest scanout frame.
>
> By default (and so far) rendering of the guest frame is started in
> the draw event to make sure guest display update is sychronized with
> host's vsync. But this method inevitably brings some extra wait because
> most of time, the draw event is not happening right after the guest
> scanout frame is flushed.
>

> This delay often makes the guest stuck at certain frame for too long and
> causes general performance degradation of graphic workloads on the guest's
> side especially when the display update rate is high. This unwanted perf
> drop can be reduced if the guest scanout frame is rendered as soon as it is
> flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
> pipeline can be unblocked a lot earlier in this case so that guest can
> move to the next display frame right away.
>
> However, this "asynchrounous" render mode may cause some waste of resources
> as the guest could produce more frames than what are actually displayed
> on the host display. This is similar as running rendering apps with no
> vblank
> or vsync option. This is why this feature should stay as optional.
>

Indeed, I don't see much point in doing so, it's pure waste. If you want to
benchmark something perhaps. But then, why not simply run a benchmark
offscreen?


>
> The param 'render_sync' is set to 'true' by default and this is in line
> with traditional way while setting it to 'false' is basically enabling
> this asynchronouse mode.
>
>
As it stands now, the option should actually be on the display backend
(gtk/gtk-egl), it's not implemented for other backends.

I am not convinced this is generally useful to be an extra option though.


> Dongwon Kim (4):
>   hw/display/virtio-gpu: Introducing render_sync param
>   ui/egl-helpers: Consolidates create-sync and create-fence
>   ui/gtk-egl: Start rendering of guest blob scanout if render_sync is
> off
>   ui/gtk-gl-draw: Start rendering of guest blob scanout if render_sync
> is off
>
>  include/hw/virtio/virtio-gpu.h  |  3 ++
>  include/ui/dmabuf.h |  4 +-
>  include/ui/egl-helpers.h|  3 +-
>  hw/display/vhost-user-gpu.c |  3 +-
>  hw/display/virtio-gpu-udmabuf.c |  3 +-
>  hw/display/virtio-gpu.c |  2 +
>  hw/vfio/display.c   |  3 +-
>  ui/dbus-listener.c  |  2 +-
>  ui/dmabuf.c | 11 +++-
>  ui/egl-helpers.c| 27 --
>  ui/gtk-egl.c| 93 ++---
>  ui/gtk-gl-area.c| 90 +++
>  12 files changed, 146 insertions(+), 98 deletions(-)
>
> --
> 2.34.1
>
>
>

-- 
Marc-André Lureau


[RFC PATCH 0/4] hw/display/virtio-gpu: Introducing asynchronous guest display render

2024-06-20 Thread dongwon . kim
From: Dongwon Kim 

Introducing new virtio-gpu param, 'render_sync' when guest scanout blob
is used (blob=true). The new param is used to specify when to start
rendering a guest scanout frame.

By default (and so far) rendering of the guest frame is started in
the draw event to make sure guest display update is sychronized with
host's vsync. But this method inevitably brings some extra wait because
most of time, the draw event is not happening right after the guest
scanout frame is flushed.

This delay often makes the guest stuck at certain frame for too long and
causes general performance degradation of graphic workloads on the guest's
side especially when the display update rate is high. This unwanted perf
drop can be reduced if the guest scanout frame is rendered as soon as it is
flushed through 'VIRTIO_GPU_CMD_RESOURCE_FLUSH' msg. The gl display
pipeline can be unblocked a lot earlier in this case so that guest can
move to the next display frame right away.

However, this "asynchrounous" render mode may cause some waste of resources
as the guest could produce more frames than what are actually displayed
on the host display. This is similar as running rendering apps with no vblank
or vsync option. This is why this feature should stay as optional.

The param 'render_sync' is set to 'true' by default and this is in line
with traditional way while setting it to 'false' is basically enabling
this asynchronouse mode.

Dongwon Kim (4):
  hw/display/virtio-gpu: Introducing render_sync param
  ui/egl-helpers: Consolidates create-sync and create-fence
  ui/gtk-egl: Start rendering of guest blob scanout if render_sync is
off
  ui/gtk-gl-draw: Start rendering of guest blob scanout if render_sync
is off

 include/hw/virtio/virtio-gpu.h  |  3 ++
 include/ui/dmabuf.h |  4 +-
 include/ui/egl-helpers.h|  3 +-
 hw/display/vhost-user-gpu.c |  3 +-
 hw/display/virtio-gpu-udmabuf.c |  3 +-
 hw/display/virtio-gpu.c |  2 +
 hw/vfio/display.c   |  3 +-
 ui/dbus-listener.c  |  2 +-
 ui/dmabuf.c | 11 +++-
 ui/egl-helpers.c| 27 --
 ui/gtk-egl.c| 93 ++---
 ui/gtk-gl-area.c| 90 +++
 12 files changed, 146 insertions(+), 98 deletions(-)

-- 
2.34.1