Hi Marc-André,

On 6/13/2024 6:16 AM, Marc-André Lureau wrote:
Hi

On Wed, Jun 12, 2024 at 10:50 PM Kim, Dongwon <dongwon....@intel.com <mailto:dongwon....@intel.com>> wrote:

    On 6/11/2024 10:44 PM, Marc-André Lureau wrote:
     > Hi
     >
     > On Wed, Jun 12, 2024 at 5:29 AM Kim, Dongwon
    <dongwon....@intel.com <mailto:dongwon....@intel.com>
     > <mailto:dongwon....@intel.com <mailto:dongwon....@intel.com>>> wrote:
     >
     >     Hi,
     >
     >     From: Marc-André Lureau <marcandre.lur...@gmail.com
    <mailto:marcandre.lur...@gmail.com>
     >     <mailto:marcandre.lur...@gmail.com
    <mailto:marcandre.lur...@gmail.com>>>
     >     Sent: Wednesday, June 5, 2024 12:56 AM
     >     To: Kim, Dongwon <dongwon....@intel.com
    <mailto:dongwon....@intel.com> <mailto:dongwon....@intel.com
    <mailto:dongwon....@intel.com>>>
     >     Cc: qemu-devel@nongnu.org <mailto:qemu-devel@nongnu.org>
    <mailto:qemu-devel@nongnu.org <mailto:qemu-devel@nongnu.org>>; Peter Xu
     >     <pet...@redhat.com <mailto:pet...@redhat.com>
    <mailto:pet...@redhat.com <mailto:pet...@redhat.com>>>
     >     Subject: Re: [PATCH] ui/gtk: Wait until the current guest
    frame is
     >     rendered before switching to RUN_STATE_SAVE_VM
     >
     >     Hi
     >
     >     On Tue, Jun 4, 2024 at 9:49 PM Kim, Dongwon
     >     <mailto:dongwon....@intel.com <mailto:dongwon....@intel.com>
    <mailto:dongwon....@intel.com <mailto:dongwon....@intel.com>>> wrote:
     >     On 6/4/2024 4:12 AM, Marc-André Lureau wrote:
     >      > Hi
     >      >
     >      > On Thu, May 30, 2024 at 2:44 AM
    <mailto:dongwon....@intel.com <mailto:dongwon....@intel.com>
     >     <mailto:dongwon....@intel.com <mailto:dongwon....@intel.com>>
     >      > <mailto:mailto <mailto:mailto> <mailto:mailto
    <mailto:mailto>>:dongwon....@intel.com <mailto:dongwon....@intel.com>
     >     <mailto:dongwon....@intel.com
    <mailto:dongwon....@intel.com>>>> wrote:
     >      >
     >      >     From: Dongwon <mailto:dongwon....@intel.com
    <mailto:dongwon....@intel.com>
     >     <mailto:dongwon....@intel.com <mailto:dongwon....@intel.com>>
    <mailto:mailto <mailto:mailto>
     >     <mailto:mailto <mailto:mailto>>:dongwon....@intel.com
    <mailto:dongwon....@intel.com> <mailto:dongwon....@intel.com
    <mailto:dongwon....@intel.com>>>>
     >      >
     >      >     Make sure rendering of the current frame is finished
    before
     >     switching
     >      >     the run state to RUN_STATE_SAVE_VM by waiting for egl-sync
     >     object to be
     >      >     signaled.
     >      >
     >      >
     >      > Can you expand on what this solves?
     >
     >     In current scheme, guest waits for the fence to be signaled
    for each
     >     frame it submits before moving to the next frame. If the
    guest’s state
     >     is saved while it is still waiting for the fence, The guest will
     >     continue to  wait for the fence that was signaled while ago
    when it is
     >     restored to the point. One way to prevent it is to get it
    finish the
     >     current frame before changing the state.
     >
     >     After the UI sets a fence, hw_ops->gl_block(true) gets
    called, which
     >     will block virtio-gpu/virgl from processing commands (until the
     >     fence is signaled and gl_block/false called again).
     >
     >     But this "blocking" state is not saved. So how does this affect
     >     save/restore? Please give more details, thanks
     >
     >     Yeah sure. "Blocking" state is not saved but guest's state is
    saved
     >     while it was still waiting for the response for its last
     >     resource-flush virtio msg. This virtio response, by the way
    is set
     >     to be sent to the guest when the pipeline is unblocked (and
    when the
     >     fence is signaled.). Once the guest's state is saved, current
     >     instance of guest will be continued and receives the response as
     >     usual. The problem is happening when we restore the saved guest's
     >     state again because what guest does will be waiting for the
    response
     >     that was sent a while ago to the original instance.
     >
     >
     > Where is the pending response saved? Can you detail how you test
    this?
     >

    There is no pending response for the guest's restored point, which is a
    problem. The response is sent out after saving is done.

    Normal cycle :

    resource-flush (scanout flush) -> gl block -> render -> gl unblock
    (after fence is signaled) -> pending response sent out to the guest ->
    guest (virtio-gpu drv) processes the next scanout frame -> (next cycle)
    resource-flush -> gl block ......

    When vm state is saved in the middle :

    resource-flush (scanout-flush) -> gl block -> saving vm-state -> render
    -> gl unblock -> pending response (resp #1) sent out to the guest ->
    guest (virtio-gpu drv) processes the next scanout frame -> (next cycle)
    resource-flush -> gl block ......

    Now, we restore the vm-state we saved

    vm-state is restored -> guest (virtio-gpu drv) can't move on as this
    state is still waiting for the response (resp #1)


Ok, so actually it's more of a device state issue than a UI/GTK. We end up not saving a state that reflects the guest state. My understanding is that the guest is waiting for a fence reply, and we don't save that. Imho, a better fix would be to either save the fenceq (but then, what else is missing to complete the operation on resume?), or have a wait to delay the migration until the fences are flushed.

The second method you are proposing here - 'have a wait'. I understand you mean delaying the start point of migration but don't you think the current patch is basically doing the similar thing? Assuming egl wait sync is what we need to use for a wait, do you have any suggestion where that should be called other than 'gd_change_runstate'?



    So we need to make sure vm-state is saved after the cycle is completed.

    This situation would be only happening if you use blob=true with
    virtio-gpu drv as KMS on the linux guest. Do you have any similar setup?


No, further details to reproduce would help. Even better would be having some automated test.

I will think about this. We use GPU shared via SRIOV as a GPU deviceand virtio-gpu as a display device on the guest. I think I need to find and test more general cases.



--
Marc-André Lureau


Reply via email to