[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Chen, Jiqian


On 2023/9/20 15:56, Parav Pandit wrote:
> Hi Jiquian,
> 
>> From: Chen, Jiqian 
>> Sent: Wednesday, September 20, 2023 1:24 PM
>>
>> Hi Lingshan,
>> Please reply to your own email thread, below are not related to my patches.
>> Thanks a lot.
> 
> They are related to your patch.
>  Both the patches have overlapping functionalities.
But Lingshan's patch is not meet my need. It clears the SUSPEND bit during 
reset.

> 
> You probably missed my previous response explaining the same at [1].
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00255.html
Yes, I didn't receive this response. 
After reading your responses, I think your concerns are below:
> The point is when driver tells to freeze, it is freeze command and not reset.
> So resume() should not invoke device_reset() when FREEZE+RESUME supported.
The modifications in my Qemu and kernel patches, I just set freeze_mode to be 
S3, and then when guest resume I can change the reset behavior according the S3 
mode, see below callstack:
pci_pm_resume
virtio_pci_restore
virtio_device_restore
virtio_reset_device(set 0 the device status and then 
call virtio_reset to reset device)
drv->restore(virtio_gpu_restore)
So, reset will be done during the restore process(resume invoke the reset), and 
I want to affect the reset behavior during the restore process.

> 

-- 
Best regards,
Jiqian Chen.


RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Parav Pandit
Hi Jiquian,

> From: Chen, Jiqian 
> Sent: Wednesday, September 20, 2023 1:24 PM
> 
> Hi Lingshan,
> Please reply to your own email thread, below are not related to my patches.
> Thanks a lot.

They are related to your patch.
 Both the patches have overlapping functionalities.

You probably missed my previous response explaining the same at [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00255.html



Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Zhu, Lingshan




On 9/20/2023 3:51 PM, Parav Pandit wrote:

From: Zhu, Lingshan 
Sent: Wednesday, September 20, 2023 1:17 PM

This is not live or device migration. This is restoring the device context

initiated by the driver owning the device.
restore the device context should be done by the hypervisor before setting
DRIVER_OK and waking up the guest, not a concern here, out of spec

The framework is generic for the PCI devices hence, there may not be any 
hypervisor at all. Hence restore operation to trigger on DRIVER_OK setting, 
when previously suspended.

Since we already agree in previous email that re-read until device sets the 
DRIVER_OK, its fine to me. It covers the above restore condition.

OK


Thanks.




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Chen, Jiqian
Hi Lingshan,
Please reply to your own email thread, below are not related to my patches. 
Thanks a lot.

On 2023/9/20 15:47, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 3:35 PM, Parav Pandit wrote:
>>> From: Zhu, Lingshan 
>>> Sent: Wednesday, September 20, 2023 1:00 PM
>>>
>>> On 9/20/2023 3:24 PM, Chen, Jiqian wrote:
 Hi Lingshan,
 It seems you reply to the wrong email thread. They are not related to my
>>> patch.
>>> These reply to Parva's comments.
>>> @Parva, if you want to discuss more about live migration, please reply in my
>>> thread, lets don't flood here.
>> You made the point that "this is not live migration".
>> I am not discussing live migration in this thread.
>>
>> I replied for the point that device restoration from suspend for physical 
>> and hypevisor based device is not a 40nsec worth of work to restore by just 
>> doing a register write.
>> This is not live or device migration. This is restoring the device context 
>> initiated by the driver owning the device.
> restore the device context should be done by the hypervisor before setting 
> DRIVER_OK and waking up the guest, not a concern here, out of spec
> 

-- 
Best regards,
Jiqian Chen.


RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Parav Pandit

> From: Zhu, Lingshan 
> Sent: Wednesday, September 20, 2023 1:17 PM

> > This is not live or device migration. This is restoring the device context
> initiated by the driver owning the device.
> restore the device context should be done by the hypervisor before setting
> DRIVER_OK and waking up the guest, not a concern here, out of spec

The framework is generic for the PCI devices hence, there may not be any 
hypervisor at all. Hence restore operation to trigger on DRIVER_OK setting, 
when previously suspended.

Since we already agree in previous email that re-read until device sets the 
DRIVER_OK, its fine to me. It covers the above restore condition.

Thanks.



RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Parav Pandit

> From: Zhu, Lingshan 
> Sent: Wednesday, September 20, 2023 1:16 PM

[..]
> > In my view, setting the DRIVER_OK is the signal regardless of hypervisor or
> physical device.
> > Hence the re-read is must.
> Yes, as I said below, should verify by re-read.
> >
Thanks.


Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Zhu, Lingshan




On 9/20/2023 3:35 PM, Parav Pandit wrote:

From: Zhu, Lingshan 
Sent: Wednesday, September 20, 2023 1:00 PM

On 9/20/2023 3:24 PM, Chen, Jiqian wrote:

Hi Lingshan,
It seems you reply to the wrong email thread. They are not related to my

patch.
These reply to Parva's comments.
@Parva, if you want to discuss more about live migration, please reply in my
thread, lets don't flood here.

You made the point that "this is not live migration".
I am not discussing live migration in this thread.

I replied for the point that device restoration from suspend for physical and 
hypevisor based device is not a 40nsec worth of work to restore by just doing a 
register write.
This is not live or device migration. This is restoring the device context 
initiated by the driver owning the device.
restore the device context should be done by the hypervisor before 
setting DRIVER_OK and waking up the guest, not a concern here, out of spec



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Zhu, Lingshan




On 9/20/2023 3:32 PM, Parav Pandit wrote:



From: Zhu, Lingshan 
Sent: Wednesday, September 20, 2023 12:58 PM

On 9/20/2023 3:10 PM, Parav Pandit wrote:

From: Zhu, Lingshan 
Sent: Wednesday, September 20, 2023 12:37 PM

The problem to overcome in [1] is, resume operation needs to be
synchronous

as it involves large part of context to resume back, and hence just
asynchronously setting DRIVER_OK is not enough.

The sw must verify back that device has resumed the operation and
ready to

answer requests.
this is not live migration, all device status and other information
still stay in the device, no need to "resume" context, just resume running.


I am aware that it is not live migration. :)

"Just resuming" involves lot of device setup task. The device implementation

does not know for how long a device is suspended.

So for example, a VM is suspended for 6 hours, hence the device context

could be saved in a slow disk.

Hence, when the resume is done, it needs to setup things again and driver got

to verify before accessing more from the device.
The restore procedures should perform by the hypervisor and done before set
DRIVER_OK and wake up the guest.

Which is the signal to trigger the restore? Which is the trigger in physical 
device when there is no hypervisor?

In my view, setting the DRIVER_OK is the signal regardless of hypervisor or 
physical device.
Hence the re-read is must.

Yes, as I said below, should verify by re-read.



And the hypervisor/driver needs to check the device status by re-reading.

Like resume from a failed LM.

This is slightly different flow than setting the DRIVER_OK for the
first time

device initialization sequence as it does not involve large restoration.

So, to merge two ideas, instead of doing DRIVER_OK to resume, the
driver

should clear the SUSPEND bit and verify that it is out of SUSPEND.

Because driver is still in _OK_ driving the device flipping the SUSPEND bit.

Please read the spec, it says:
The driver MUST NOT clear a device status bit


Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's

synchronous new register..
so re-read

Yes. re-read until set, Thanks.




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Zhu, Lingshan




On 9/20/2023 3:17 PM, Chen, Jiqian wrote:

Hi Lingshan,

On 2023/9/20 14:58, Zhu, Lingshan wrote:


On 9/20/2023 2:33 PM, Chen, Jiqian wrote:

Hi Lingshan,

On 2023/9/20 13:59, Zhu, Lingshan wrote:

On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:

On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:

When guest vm does S3, Qemu will reset and clear some things of virtio
devices, but guest can't aware that, so that may cause some problems.
For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
resume, that function will destroy render resources of virtio-gpu. As
a result, after guest resume, the display can't come back and we only
saw a black screen. Due to guest can't re-create all the resources, so
we need to let Qemu not to destroy them when S3.

For above purpose, we need a mechanism that allows guests and QEMU to
negotiate their reset behavior. So this patch add a new parameter
named freeze_mode to struct virtio_pci_common_cfg. And when guest
suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
devices can change their reset behavior on Qemu side according to
freeze_mode. What's more, freeze_mode can be used for all virtio
devices to affect the behavior of Qemu, not just virtio gpu device.

Hi Jiqian,

Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq 
state
https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b55...@intel.com/T/

We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
negotiated, the driver can set SUSPEND in the device status to suspend the
device.

When SUSPEND, the device should pause its operations and preserve its 
configurations
in its configuration space.

The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes running.

This is originally to serve live migration, but I think it can also meet your 
needs.

I noticed your series, but I am not sure they are also meet my needs.
If driver write 0 to reset device, can the SUSPEND bit be cleared? 
(pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)

if the driver writes 0, it resets all virtio functionalities. So SUSPEND is 
cleared.

Then your patches are not meet my needs. In my scene, it needs to keep the 
SUSPEND bit util the resume process is complete.
Because in my virtio-gpu scene, when guest resume, it call virtio_reset_device 
to clear all device status bits, and then reset virtio-gpu in Qemu, and then 
destroy render resources, I don't want the resources are destroyed during the 
resume process. So, I add freeze_mode to tell Qemu that guest is doing S3 and 
resources need to be kept.
When a guest set to S3, the hypervisor suspend the guest to RAM, and the 
passthrough-ed device are in low power state.
I am not sure the device can keep its status/context/data, maybe need to 
recover from RAM anyway.


So I suggest not reset the device, there need some code changes in QEMU
When set to S3, SUSPEND the device, then suspend to RAM
When resume from S3, restore from RAM and set DRIVER_OK to resume running.



device reset can also be used to recover the device from fatal errors, so it 
should reset everything in virtio.

If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if 
the reset request is from guest restore process or not, and then I can't change 
the reset behavior.

I think when enter S3, the hypervisor/driver should set SUSPEND to the device. 
And when resume from S3, the hypervisor/driver should
re-write DRIVER_OK to clear SUSPEND, then the device resume running.

Can you send me your patch link on kernel and qemu side? I will take a deep 
look.

There are no patches for qemu/kernel yet, spec first.

Thanks,
Zhu Lingshan

Signed-off-by: Jiqian Chen 
---
    transport-pci.tex | 7 +++
    1 file changed, 7 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..2543536 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -319,6 +319,7 @@ \subsubsection{Common configuration structure 
layout}\label{sec:Virtio Transport
    le64 queue_desc;    /* read-write */
    le64 queue_driver;  /* read-write */
    le64 queue_device;  /* read-write */
+    le16 freeze_mode;   /* read-write */
    le16 queue_notif_config_data;   /* read-only for driver */
    le16 queue_reset;   /* read-write */


we can't add fields in the middle of the structure like this -
offset of queue_notif_config_data and queue_reset changes.

   

@@ -393,6 +394,12 @@ \subsubsection{Common configuration structure 
layout}\label{sec:Virtio Transport
    \item[\field{queue_device}]
    The driver writes the physical address of Device Area here.  See 
section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
    +\item[\field{freeze_mode}]
+    The driver writes this to set the freeze mode of virtio pci.
+    VIRTIO_PCI_FREEZE_MODE_UNFREEZE - 

RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Parav Pandit

> From: Zhu, Lingshan 
> Sent: Wednesday, September 20, 2023 1:00 PM
> 
> On 9/20/2023 3:24 PM, Chen, Jiqian wrote:
> > Hi Lingshan,
> > It seems you reply to the wrong email thread. They are not related to my
> patch.
> These reply to Parva's comments.
> @Parva, if you want to discuss more about live migration, please reply in my
> thread, lets don't flood here.
You made the point that "this is not live migration".
I am not discussing live migration in this thread.

I replied for the point that device restoration from suspend for physical and 
hypevisor based device is not a 40nsec worth of work to restore by just doing a 
register write.
This is not live or device migration. This is restoring the device context 
initiated by the driver owning the device.


RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Parav Pandit


> From: Zhu, Lingshan 
> Sent: Wednesday, September 20, 2023 12:58 PM
> 
> On 9/20/2023 3:10 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan 
> >> Sent: Wednesday, September 20, 2023 12:37 PM
> >>> The problem to overcome in [1] is, resume operation needs to be
> >>> synchronous
> >> as it involves large part of context to resume back, and hence just
> >> asynchronously setting DRIVER_OK is not enough.
> >>> The sw must verify back that device has resumed the operation and
> >>> ready to
> >> answer requests.
> >> this is not live migration, all device status and other information
> >> still stay in the device, no need to "resume" context, just resume running.
> >>
> > I am aware that it is not live migration. :)
> >
> > "Just resuming" involves lot of device setup task. The device implementation
> does not know for how long a device is suspended.
> > So for example, a VM is suspended for 6 hours, hence the device context
> could be saved in a slow disk.
> > Hence, when the resume is done, it needs to setup things again and driver 
> > got
> to verify before accessing more from the device.
> The restore procedures should perform by the hypervisor and done before set
> DRIVER_OK and wake up the guest.
Which is the signal to trigger the restore? Which is the trigger in physical 
device when there is no hypervisor?

In my view, setting the DRIVER_OK is the signal regardless of hypervisor or 
physical device.
Hence the re-read is must.

> And the hypervisor/driver needs to check the device status by re-reading.
> >
> >> Like resume from a failed LM.
> >>> This is slightly different flow than setting the DRIVER_OK for the
> >>> first time
> >> device initialization sequence as it does not involve large restoration.
> >>> So, to merge two ideas, instead of doing DRIVER_OK to resume, the
> >>> driver
> >> should clear the SUSPEND bit and verify that it is out of SUSPEND.
> >>> Because driver is still in _OK_ driving the device flipping the SUSPEND 
> >>> bit.
> >> Please read the spec, it says:
> >> The driver MUST NOT clear a device status bit
> >>
> > Yes, this is why either DRIER_OK validation by the driver is needed or 
> > Jiqian's
> synchronous new register..
> so re-read
Yes. re-read until set, Thanks.



Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Zhu, Lingshan




On 9/20/2023 3:24 PM, Chen, Jiqian wrote:

Hi Lingshan,
It seems you reply to the wrong email thread. They are not related to my patch.

These reply to Parva's comments.
@Parva, if you want to discuss more about live migration, please reply 
in my thread, lets don't flood here.


On 2023/9/20 15:06, Zhu, Lingshan wrote:


On 9/20/2023 2:58 PM, Parav Pandit wrote:

From: Chen, Jiqian 
Sent: Wednesday, September 20, 2023 12:03 PM
If driver write 0 to reset device, can the SUSPEND bit be cleared?

It must as reset operation, resets everything else and so the suspend too.


(pci_pm_resume->virtio_pci_restore->virtio_device_restore-

virtio_reset_device)

If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
the reset request is from guest restore process or not, and then I can't change
the reset behavior.

Reset should not be influenced by suspend.
Suspend should do the work of suspend and reset to do the reset.

The problem to overcome in [1] is, resume operation needs to be synchronous as 
it involves large part of context to resume back, and hence just asynchronously 
setting DRIVER_OK is not enough.
The sw must verify back that device has resumed the operation and ready to 
answer requests.

this is not live migration, all device status and other information still stay in the 
device, no need to "resume" context, just resume running.

Like resume from a failed LM.

This is slightly different flow than setting the DRIVER_OK for the first time 
device initialization sequence as it does not involve large restoration.

So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should 
clear the SUSPEND bit and verify that it is out of SUSPEND.

Because driver is still in _OK_ driving the device flipping the SUSPEND bit.

Please read the spec, it says:
The driver MUST NOT clear a device status bit





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Zhu, Lingshan




On 9/20/2023 3:10 PM, Parav Pandit wrote:

From: Zhu, Lingshan 
Sent: Wednesday, September 20, 2023 12:37 PM

The problem to overcome in [1] is, resume operation needs to be synchronous

as it involves large part of context to resume back, and hence just
asynchronously setting DRIVER_OK is not enough.

The sw must verify back that device has resumed the operation and ready to

answer requests.
this is not live migration, all device status and other information still stay 
in the
device, no need to "resume" context, just resume running.


I am aware that it is not live migration. :)

"Just resuming" involves lot of device setup task. The device implementation 
does not know for how long a device is suspended.
So for example, a VM is suspended for 6 hours, hence the device context could 
be saved in a slow disk.
Hence, when the resume is done, it needs to setup things again and driver got 
to verify before accessing more from the device.
The restore procedures should perform by the hypervisor and done before 
set DRIVER_OK and wake up the guest.

And the hypervisor/driver needs to check the device status by re-reading.
  

Like resume from a failed LM.

This is slightly different flow than setting the DRIVER_OK for the first time

device initialization sequence as it does not involve large restoration.

So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver

should clear the SUSPEND bit and verify that it is out of SUSPEND.

Because driver is still in _OK_ driving the device flipping the SUSPEND bit.

Please read the spec, it says:
The driver MUST NOT clear a device status bit


Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's 
synchronous new register..

so re-read





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Chen, Jiqian
Hi Lingshan,
It seems you reply to the wrong email thread. They are not related to my patch.

On 2023/9/20 15:06, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 2:58 PM, Parav Pandit wrote:
>>> From: Chen, Jiqian 
>>> Sent: Wednesday, September 20, 2023 12:03 PM
>>> If driver write 0 to reset device, can the SUSPEND bit be cleared?
>> It must as reset operation, resets everything else and so the suspend too.
>>
>>> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
 virtio_reset_device)
>>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge 
>>> if
>>> the reset request is from guest restore process or not, and then I can't 
>>> change
>>> the reset behavior.
>> Reset should not be influenced by suspend.
>> Suspend should do the work of suspend and reset to do the reset.
>>
>> The problem to overcome in [1] is, resume operation needs to be synchronous 
>> as it involves large part of context to resume back, and hence just 
>> asynchronously setting DRIVER_OK is not enough.
>> The sw must verify back that device has resumed the operation and ready to 
>> answer requests.
> this is not live migration, all device status and other information still 
> stay in the device, no need to "resume" context, just resume running.
> 
> Like resume from a failed LM.
>>
>> This is slightly different flow than setting the DRIVER_OK for the first 
>> time device initialization sequence as it does not involve large restoration.
>>
>> So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver 
>> should clear the SUSPEND bit and verify that it is out of SUSPEND.
>>
>> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> Please read the spec, it says:
> The driver MUST NOT clear a device status bit
> 
> 

-- 
Best regards,
Jiqian Chen.


[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Chen, Jiqian
Hi Lingshan,

On 2023/9/20 14:58, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 2:33 PM, Chen, Jiqian wrote:
>> Hi Lingshan,
>>
>> On 2023/9/20 13:59, Zhu, Lingshan wrote:
>>>
>>> On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:
 On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
> When guest vm does S3, Qemu will reset and clear some things of virtio
> devices, but guest can't aware that, so that may cause some problems.
> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
> resume, that function will destroy render resources of virtio-gpu. As
> a result, after guest resume, the display can't come back and we only
> saw a black screen. Due to guest can't re-create all the resources, so
> we need to let Qemu not to destroy them when S3.
>
> For above purpose, we need a mechanism that allows guests and QEMU to
> negotiate their reset behavior. So this patch add a new parameter
> named freeze_mode to struct virtio_pci_common_cfg. And when guest
> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
> devices can change their reset behavior on Qemu side according to
> freeze_mode. What's more, freeze_mode can be used for all virtio
> devices to affect the behavior of Qemu, not just virtio gpu device.
>>> Hi Jiqian,
>>>
>>> Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq 
>>> state
>>> https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b55...@intel.com/T/
>>>
>>> We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
>>> negotiated, the driver can set SUSPEND in the device status to suspend the
>>> device.
>>>
>>> When SUSPEND, the device should pause its operations and preserve its 
>>> configurations
>>> in its configuration space.
>>>
>>> The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes 
>>> running.
>>>
>>> This is originally to serve live migration, but I think it can also meet 
>>> your needs.
>> I noticed your series, but I am not sure they are also meet my needs.
>> If driver write 0 to reset device, can the SUSPEND bit be cleared? 
>> (pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)
> if the driver writes 0, it resets all virtio functionalities. So SUSPEND is 
> cleared.
Then your patches are not meet my needs. In my scene, it needs to keep the 
SUSPEND bit util the resume process is complete.
Because in my virtio-gpu scene, when guest resume, it call virtio_reset_device 
to clear all device status bits, and then reset virtio-gpu in Qemu, and then 
destroy render resources, I don't want the resources are destroyed during the 
resume process. So, I add freeze_mode to tell Qemu that guest is doing S3 and 
resources need to be kept.

> device reset can also be used to recover the device from fatal errors, so it 
> should reset everything in virtio.
>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge 
>> if the reset request is from guest restore process or not, and then I can't 
>> change the reset behavior.
> I think when enter S3, the hypervisor/driver should set SUSPEND to the 
> device. And when resume from S3, the hypervisor/driver should
> re-write DRIVER_OK to clear SUSPEND, then the device resume running.
>> Can you send me your patch link on kernel and qemu side? I will take a deep 
>> look.
> There are no patches for qemu/kernel yet, spec first.
>>
>>> Thanks,
>>> Zhu Lingshan
> Signed-off-by: Jiqian Chen 
> ---
>    transport-pci.tex | 7 +++
>    1 file changed, 7 insertions(+)
>
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..2543536 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure 
> layout}\label{sec:Virtio Transport
>    le64 queue_desc;    /* read-write */
>    le64 queue_driver;  /* read-write */
>    le64 queue_device;  /* read-write */
> +    le16 freeze_mode;   /* read-write */
>    le16 queue_notif_config_data;   /* read-only for driver */
>    le16 queue_reset;   /* read-write */
>
 we can't add fields in the middle of the structure like this -
 offset of queue_notif_config_data and queue_reset changes.

   
> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure 
> layout}\label{sec:Virtio Transport
>    \item[\field{queue_device}]
>    The driver writes the physical address of Device Area here.  
> See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>    +\item[\field{freeze_mode}]
> +    The driver writes this to set the freeze mode of virtio pci.
> +    VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> +    VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing 

RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Parav Pandit

> From: Zhu, Lingshan 
> Sent: Wednesday, September 20, 2023 12:37 PM

> > The problem to overcome in [1] is, resume operation needs to be synchronous
> as it involves large part of context to resume back, and hence just
> asynchronously setting DRIVER_OK is not enough.
> > The sw must verify back that device has resumed the operation and ready to
> answer requests.
> this is not live migration, all device status and other information still 
> stay in the
> device, no need to "resume" context, just resume running.
> 
I am aware that it is not live migration. :)

"Just resuming" involves lot of device setup task. The device implementation 
does not know for how long a device is suspended.
So for example, a VM is suspended for 6 hours, hence the device context could 
be saved in a slow disk.
Hence, when the resume is done, it needs to setup things again and driver got 
to verify before accessing more from the device.
 
> Like resume from a failed LM.
> >
> > This is slightly different flow than setting the DRIVER_OK for the first 
> > time
> device initialization sequence as it does not involve large restoration.
> >
> > So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver
> should clear the SUSPEND bit and verify that it is out of SUSPEND.
> >
> > Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> Please read the spec, it says:
> The driver MUST NOT clear a device status bit
> 
Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's 
synchronous new register..



Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Zhu, Lingshan




On 9/20/2023 2:58 PM, Parav Pandit wrote:

From: Chen, Jiqian 
Sent: Wednesday, September 20, 2023 12:03 PM
If driver write 0 to reset device, can the SUSPEND bit be cleared?

It must as reset operation, resets everything else and so the suspend too.


(pci_pm_resume->virtio_pci_restore->virtio_device_restore-

virtio_reset_device)

If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
the reset request is from guest restore process or not, and then I can't change
the reset behavior.

Reset should not be influenced by suspend.
Suspend should do the work of suspend and reset to do the reset.

The problem to overcome in [1] is, resume operation needs to be synchronous as 
it involves large part of context to resume back, and hence just asynchronously 
setting DRIVER_OK is not enough.
The sw must verify back that device has resumed the operation and ready to 
answer requests.
this is not live migration, all device status and other information 
still stay in the device, no need to "resume" context, just resume running.


Like resume from a failed LM.


This is slightly different flow than setting the DRIVER_OK for the first time 
device initialization sequence as it does not involve large restoration.

So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should 
clear the SUSPEND bit and verify that it is out of SUSPEND.

Because driver is still in _OK_ driving the device flipping the SUSPEND bit.

Please read the spec, it says:
The driver MUST NOT clear a device status bit



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Zhu, Lingshan




On 9/20/2023 2:33 PM, Chen, Jiqian wrote:

Hi Lingshan,

On 2023/9/20 13:59, Zhu, Lingshan wrote:


On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:

On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:

When guest vm does S3, Qemu will reset and clear some things of virtio
devices, but guest can't aware that, so that may cause some problems.
For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
resume, that function will destroy render resources of virtio-gpu. As
a result, after guest resume, the display can't come back and we only
saw a black screen. Due to guest can't re-create all the resources, so
we need to let Qemu not to destroy them when S3.

For above purpose, we need a mechanism that allows guests and QEMU to
negotiate their reset behavior. So this patch add a new parameter
named freeze_mode to struct virtio_pci_common_cfg. And when guest
suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
devices can change their reset behavior on Qemu side according to
freeze_mode. What's more, freeze_mode can be used for all virtio
devices to affect the behavior of Qemu, not just virtio gpu device.

Hi Jiqian,

Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq 
state
https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b55...@intel.com/T/

We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
negotiated, the driver can set SUSPEND in the device status to suspend the
device.

When SUSPEND, the device should pause its operations and preserve its 
configurations
in its configuration space.

The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes running.

This is originally to serve live migration, but I think it can also meet your 
needs.

I noticed your series, but I am not sure they are also meet my needs.
If driver write 0 to reset device, can the SUSPEND bit be cleared? 
(pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)
if the driver writes 0, it resets all virtio functionalities. So SUSPEND 
is cleared.
device reset can also be used to recover the device from fatal errors, 
so it should reset everything in virtio.

If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if 
the reset request is from guest restore process or not, and then I can't change 
the reset behavior.
I think when enter S3, the hypervisor/driver should set SUSPEND to the 
device. And when resume from S3, the hypervisor/driver should

re-write DRIVER_OK to clear SUSPEND, then the device resume running.

Can you send me your patch link on kernel and qemu side? I will take a deep 
look.

There are no patches for qemu/kernel yet, spec first.



Thanks,
Zhu Lingshan

Signed-off-by: Jiqian Chen 
---
   transport-pci.tex | 7 +++
   1 file changed, 7 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..2543536 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -319,6 +319,7 @@ \subsubsection{Common configuration structure 
layout}\label{sec:Virtio Transport
   le64 queue_desc;    /* read-write */
   le64 queue_driver;  /* read-write */
   le64 queue_device;  /* read-write */
+    le16 freeze_mode;   /* read-write */
   le16 queue_notif_config_data;   /* read-only for driver */
   le16 queue_reset;   /* read-write */


we can't add fields in the middle of the structure like this -
offset of queue_notif_config_data and queue_reset changes.

   

@@ -393,6 +394,12 @@ \subsubsection{Common configuration structure 
layout}\label{sec:Virtio Transport
   \item[\field{queue_device}]
   The driver writes the physical address of Device Area here.  See 
section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
   +\item[\field{freeze_mode}]
+    The driver writes this to set the freeze mode of virtio pci.
+    VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
+    VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and 
virtio-pci enters S3 suspension;
+    Other values are reserved for future use, like S4, etc.
+

we need to specify these values then.

we also need
- feature bit to detect support for S3
- conformance statements documenting behavious under S3



   \item[\field{queue_notif_config_data}]
   This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been 
negotiated.
   The driver will use this value when driver sends available buffer
--
2.34.1

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: 

RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Parav Pandit

> From: Chen, Jiqian 
> Sent: Wednesday, September 20, 2023 12:03 PM

> If driver write 0 to reset device, can the SUSPEND bit be cleared?
It must as reset operation, resets everything else and so the suspend too.

> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
> >virtio_reset_device)
> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
> the reset request is from guest restore process or not, and then I can't 
> change
> the reset behavior.
Reset should not be influenced by suspend.
Suspend should do the work of suspend and reset to do the reset.

The problem to overcome in [1] is, resume operation needs to be synchronous as 
it involves large part of context to resume back, and hence just asynchronously 
setting DRIVER_OK is not enough.
The sw must verify back that device has resumed the operation and ready to 
answer requests.

This is slightly different flow than setting the DRIVER_OK for the first time 
device initialization sequence as it does not involve large restoration.

So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should 
clear the SUSPEND bit and verify that it is out of SUSPEND.

Because driver is still in _OK_ driving the device flipping the SUSPEND bit.


Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Chen, Jiqian
Hi Lingshan,

On 2023/9/20 13:59, Zhu, Lingshan wrote:
> 
> 
> On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:
>> On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
>>> When guest vm does S3, Qemu will reset and clear some things of virtio
>>> devices, but guest can't aware that, so that may cause some problems.
>>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>>> resume, that function will destroy render resources of virtio-gpu. As
>>> a result, after guest resume, the display can't come back and we only
>>> saw a black screen. Due to guest can't re-create all the resources, so
>>> we need to let Qemu not to destroy them when S3.
>>>
>>> For above purpose, we need a mechanism that allows guests and QEMU to
>>> negotiate their reset behavior. So this patch add a new parameter
>>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>>> devices can change their reset behavior on Qemu side according to
>>> freeze_mode. What's more, freeze_mode can be used for all virtio
>>> devices to affect the behavior of Qemu, not just virtio gpu device.
> Hi Jiqian,
> 
> Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq 
> state
> https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b55...@intel.com/T/
> 
> We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
> negotiated, the driver can set SUSPEND in the device status to suspend the
> device.
> 
> When SUSPEND, the device should pause its operations and preserve its 
> configurations
> in its configuration space.
> 
> The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes running.
> 
> This is originally to serve live migration, but I think it can also meet your 
> needs.
I noticed your series, but I am not sure they are also meet my needs.
If driver write 0 to reset device, can the SUSPEND bit be cleared? 
(pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)
If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if 
the reset request is from guest restore process or not, and then I can't change 
the reset behavior.
Can you send me your patch link on kernel and qemu side? I will take a deep 
look.

> 
> Thanks,
> Zhu Lingshan
>>>
>>> Signed-off-by: Jiqian Chen 
>>> ---
>>>   transport-pci.tex | 7 +++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/transport-pci.tex b/transport-pci.tex
>>> index a5c6719..2543536 100644
>>> --- a/transport-pci.tex
>>> +++ b/transport-pci.tex
>>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure 
>>> layout}\label{sec:Virtio Transport
>>>   le64 queue_desc;    /* read-write */
>>>   le64 queue_driver;  /* read-write */
>>>   le64 queue_device;  /* read-write */
>>> +    le16 freeze_mode;   /* read-write */
>>>   le16 queue_notif_config_data;   /* read-only for driver */
>>>   le16 queue_reset;   /* read-write */
>>>
>> we can't add fields in the middle of the structure like this -
>> offset of queue_notif_config_data and queue_reset changes.
>>
>>   
>>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure 
>>> layout}\label{sec:Virtio Transport
>>>   \item[\field{queue_device}]
>>>   The driver writes the physical address of Device Area here.  See 
>>> section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>>   +\item[\field{freeze_mode}]
>>> +    The driver writes this to set the freeze mode of virtio pci.
>>> +    VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>>> +    VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and 
>>> virtio-pci enters S3 suspension;
>>> +    Other values are reserved for future use, like S4, etc.
>>> +
>> we need to specify these values then.
>>
>> we also need
>> - feature bit to detect support for S3
>> - conformance statements documenting behavious under S3
>>
>>
>>>   \item[\field{queue_notif_config_data}]
>>>   This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been 
>>> negotiated.
>>>   The driver will use this value when driver sends available buffer
>>> -- 
>>> 2.34.1
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscr...@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
>> List help: virtio-comment-h...@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>