[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Mon, May 15, 2023 at 03:59:41PM +, Parav Pandit wrote:
> 
> > From: Jason Wang 
> > Sent: Monday, May 15, 2023 3:31 AM
> 
> > > What do we gain from bisecting it?
> > 
> > 
> > 1) No need to deal with the offset in the hardware
> > 
> In context of legacy is doesn’t matter much given that its over AQ.

Let me give you an example. I feel that device specific
config should allow arbitrary length accesses so that e.g.
mac can be read and written atomically.

On the other hand, this is not the case for common config.

I feel we do not need to allow e.g. access to both common config
and device specific one in a single operation, that is
just messy. Now sure, we can add text with an explanation,
but if instead there's a flag saying "this is common cfg"
or "this is device cfg" then it fall out automatically.

Is this a deal breaker? No, but note how neither I nor Jason did not
think about this ahead of the time and the correct usage just falls out
of the interface automatically.  This is a mark of a good interface
design. Lots of corner cases that one has to be careful around is not.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH] content: Replace guest OS with driver

2023-05-15 Thread Michael S. Tsirkin
On Tue, May 16, 2023 at 06:01:39AM +0300, Parav Pandit wrote:
> Currently device status field description and driver requirements
> section uses mix of terminology for the driver. These two sections
> sometimes call the driver as 'the guest OS' or 'the driver'.
> 
> Most of the cleanup around 'guest Os' was already done around commit
> 212c0cf3 in past. Clean up the remaining few references to just
> refer it as 'driver'.
> 
> This is an editorial change.

No, editorial changes are things like formatting, correcting cross
references, resolving simple patch conflicts.  We also have a minor
cleanups rule including spelling and typos.  I feel we've been through
this discussion, no?

I'm insisting on clarifying this because you want to be an editor, and
we could benefit from more editors, but given editors have commit access
it's important to be clear what the role of an editor is, which is not
to make decisions about content - it's a technical role.


> Signed-off-by: Parav Pandit 
> ---
>  content.tex | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/content.tex b/content.tex
> index 9df81b8..417d476 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -26,10 +26,10 @@ \section{\field{Device Status} Field}\label{sec:Basic 
> Facilities of a Virtio Dev
>  following bits are defined (listed below in the order in which
>  they would be typically set):
>  \begin{description}
> -\item[ACKNOWLEDGE (1)] Indicates that the guest OS has found the
> +\item[ACKNOWLEDGE (1)] Indicates that the driver has found the
>device and recognized it as a valid virtio device.
>  
> -\item[DRIVER (2)] Indicates that the guest OS knows how to drive the
> +\item[DRIVER (2)] Indicates that the driver knows how to drive the
>device.
>\begin{note}
>  There could be a significant (or infinite) delay before setting

Actually, there is a subtle difference here that this is losing.
"guest OS" really refers to e.g. Linux virtio core code here.


ACKNOWLEDGE and DRIVER are used by virtio core.

ACKNOWLEDGE tells you virtio core attached to device, and DRIVER
tells you core found a device specific driver.



If you really want to make things better, let's find a way to explain
all this.





> @@ -473,13 +473,13 @@ \section{Device Initialization}\label{sec:General 
> Initialization And Device Oper
>  \begin{enumerate}
>  \item Reset the device.
>  
> -\item Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> +\item Set the ACKNOWLEDGE status bit: the driver has noticed the device.
>  
> -\item Set the DRIVER status bit: the guest OS knows how to drive the device.
> +\item Set the DRIVER status bit: the driver knows how to drive the device.

besides the above, "drivers knows how to drive" sounds bad.

>  \item\label{itm:General Initialization And Device Operation /
>  Device Initialization / Read feature bits} Read device feature bits, and 
> write the subset of feature bits
> -   understood by the OS and driver to the device.  During this step the
> +   understood by the driver to the device.  During this step the

Again the "the OS" here referred to core virtio (e.g. ring features).
Less of a problem to remove but if we come up with
a better terminology for ACKNOWLEDGE/DRIVER then I guess we can use it
here, too.


> driver MAY read (but MUST NOT write) the device-specific configuration 
> fields to check that it can support the device before accepting it.
>  
>  \item\label{itm:General Initialization And Device Operation / Device 
> Initialization / Set FEATURES-OK} Set the FEATURES_OK status bit.  The driver 
> MUST NOT accept
> -- 
> 2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Tue, May 16, 2023 at 11:43:09AM +0800, Jason Wang wrote:
> On Tue, May 16, 2023 at 11:37 AM Jason Wang  wrote:
> >
> >
> > 在 2023/5/16 02:01, Michael S. Tsirkin 写道:
> > > On Mon, May 15, 2023 at 06:00:02PM +, Parav Pandit wrote:
> > >>> From: Michael S. Tsirkin 
> > >>> Sent: Monday, May 15, 2023 1:56 PM
> > >>>
> > >>> On Mon, May 15, 2023 at 05:51:05PM +, Parav Pandit wrote:
> > > From: Michael S. Tsirkin 
> > > Sent: Monday, May 15, 2023 1:45 PM
> > >
> > > On Mon, May 15, 2023 at 03:49:44PM +, Parav Pandit wrote:
> > >> All legacy interface via AQ.
> > >> All modern interface access via PCI or its own transport between
> > >> driver and
> > > device.
> > >
> > > I am wondering however about the hypervisor notifications.
> > > Generally these are the most problematic aspect here I feel.  For
> > > example, how does it interact with VIRTIO_F_NOTIF_CONFIG_DATA?  And
> > > generally, having both guest and host be allowed to access device's 
> > > BAR
> > >>> seems problematic.
> > > Can we reserve a PF BAR region for these things or is that too 
> > > expensive?
> >  VIRTIO_F_NOTIF_CONFIG_DATA is not present for the legacy.
> > >>> it is not but it just might be required, or it won't be there.
> > >>>
> > >> How can it be required if it not part of it?
> > >> I likely didn't follow your question/comment, can you please explain.
> > > VIRTIO_F_NOTIF_CONFIG_DATA is there presumably for a reason.
> > >
> >  For modern device, guest driver will access the device own BAR 
> >  directly with
> > >>> its own config_data anyway.
> >  Should we reserve a region in the PF BAR for SF/SIOV device?, should 
> >  device
> > >>> report which BAR/area to use for the given SF/SIOV device?
> >  May be yes, those are different discussions with tradeoff to consider 
> >  during
> > >>> SIOV discussions. It is not related to VFs.
> > >>>
> > >>> For SIOV it's a given. But we can do this for SRIOV: reserve a PF 
> > >>> region per VF.
> > >> Each VF has its own BAR area for driver notifications.
> > > But it is passed through to guest presumably.
> >
> >
> > Probably not in the case of VIRTIO_F_NOTIF_CONFIG_DATA. Did you see any
> > problem if we do mediation here except for some performance penalty?
> 
> Ok, rethink of this, when using legacy, VIRTIO_F_NOTIF_CONFIG_DATA
> should be assumed to be not negotiated. So I think I agree with Parav,
> it should not be a problem.
> 
> Or do you mean for the devices that have some mandated features?
> 
> Thanks

Exactly. VIRTIO_F_NOTIF_CONFIG_DATA if there is likely to be
required for device to work as opposed to being an optimization.

> >
> > Thanks
> >
> >
> > >


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Mon, May 15, 2023 at 08:56:42PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, May 15, 2023 4:30 PM
> > >
> > > I am not sure if this is a real issue. Because even the legacy guests
> > > have msix enabled by default. In theory yes, it can fall back to intx.
> > 
> > Well. I feel we should be closer to being sure it's not an issue if we are 
> > going to
> > ignore it.
> > some actual data here:
> > 
> > Even linux only enabled MSI-X in 2009.
> > Of course, other guests took longer. E.g.
> > a quick google search gave me this for some bsd variant (2017):
> > https://twitter.com/dragonflybsd/status/834494984229421057
> > 
> > Many guests have tunables to disable msix. Why?
> > E.g. BSD keeps maintaining it at
> >  hw.virtio.pci.disable_msix
> > not a real use-case and you know 100% no guests have set this to work around
> > some bug e.g. in bsd MSI-X core?  How can you be sure?
> > 
> > 
> > 
> > intx is used when guests run out of legacy interrupts, these setups are not 
> > hard
> > to create at all: just constrain the number of vCPUs while creating lots of
> > devices.
> > 
> > 
> > I could go on.
> > 
> > 
> > 
> > > There are few options.
> > > 1. A hypervisor driver can be conservative and steal an msix of the VF
> > > for transporting intx.
> > > Pros: Does not need special things in device
> > > Cons:
> > > a. Fairly intrusive in hypervisor vf driver.
> > > b. May not be ever used as guest is unlikely to fail on msix
> > 
> > Yea I do not like this since we are burning up msix vectors.
> > More reasons: this "pass through" msix has no chance to set ISR properly 
> > since
> > msix does not set ISR.
> > 
> > 
> > > 2. Since multiple VFs intx to be serviced, one command per VF in AQ is
> > > too much overhead that device needs to map a request to,
> > >
> > > A better way is to have an eventq of depth = num_vfs, like many other
> > > virtio devices have it.
> > >
> > > An eventq can hold per VF interrupt entry including the isr value that
> > > you suggest above.
> > >
> > > Something like,
> > >
> > > union eventq_entry {
> > >   u8 raw_data[16];
> > >   struct intx_entry {
> > >   u8 event_opcode;
> > >   u8 group_type;
> > >   u8 reserved[6];
> > >   le64 group_identifier;
> > >   u8 isr_status;
> > >   };
> > > };
> > >
> > > This eventq resides on the owner parent PF.
> > > isr_status is read on clear like today.
> > 
> > This is what I wrote no?
> > lore.kernel.org/all/20230507050146-mutt-send-email-
> > mst%40kernel.org/t.mbox.gz
> > 
> > how about a special command that is used when device would
> > normally send INT#x? it can also return ISR to reduce latency.
> > 
> In response to your above suggestion of AQ command,
> I suggested the eventq that contains the isr_status that reduces latency as 
> you suggest.

I don't see why we need to keep adding queues though.
Just use one of admin queues.

> > > May be such eventq can be useful in future for wider case.
> > 
> > There's no maybe here is there? Things like live migration need events for 
> > sure.
> > 
> > > We may have to find a different name for it as other devices has
> > > device specific eventq.
> > 
> > We don't need a special name for it. Just use an adminq with a special
> > command that is only consumed when there is an event.
> This requires too many commands to be issued on the PF device.
> Potentially one per VF. And device needs to keep track of command to VF 
> mapping.
>
> > Note you only need to queue a command if MSI is disabled.
> > Which is nice.
> Yes, it is nice.
> An eventq is a variation of it, where device can keep reporting the events 
> without doing the extra mapping and without too many commands.

I don't get the difference then. The format you showed seems very close
to admin command. What is the difference? How do you avoid the need
to add a command per VF using INTx#?


> Additionally, eventq also works for 1.x device which will read the ISR status 
> registers directly from the device.
> 
> > 
> > > I am inclined to differ this to a later point if one can identify the
> > > real failure with msix for the guest VM.
> > > So far we don't see this ever happening.
> > 
> > What is the question exactly?
> > 
> > Just have more devices than vectors,
> > an intel CPU only has ~200 of these, and current drivers want to use 2 
> > vectors
> > and then fall back on INTx since that is shared.
> > Extremely easy to create - do you want a qemu command line to try?
> > 
> Intel CPU has 256 per core (per vcpu). So they are really a lot.
> One needs to connect lot more devices to the cpu to run out of it.
> So yes, I would like to try the command to fail.

order of 128 functions then for a 1vcpu VM. You were previously talking
about tends of 1000s of functions as justification for avoiding config
space.


> > Do specific customers event use guests with msi-x disabled? Maybe no.
> > Does anyone use virtio with msi-x disabled? Most like

[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Tue, May 16, 2023 at 11:28:04AM +0800, Jason Wang wrote:
> > 
> > > Admin virtqueue doesn't preventing anything from letting IMS go directly 
> > > to the
> > > device.
> > But there is no aq/cfgq/cmdq defined for the guest, so with current 
> > proposal it is prevented.
> 
> 
> It's a choice of the implementation but not spec. Spec doesn't say adminq
> can't be used for guest.

It actually kind of does. adminq controls group members through the
group owner.  owner is not itself a member. For a guest controlling a
member to use adminq you have to also pass the owner to guest.
Possible but not a standard architecture.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] [PATCH] content: Replace guest OS with driver

2023-05-15 Thread Jason Wang
On Tue, May 16, 2023 at 11:02 AM Parav Pandit  wrote:
>
> Currently device status field description and driver requirements
> section uses mix of terminology for the driver. These two sections
> sometimes call the driver as 'the guest OS' or 'the driver'.
>
> Most of the cleanup around 'guest Os' was already done around commit
> 212c0cf3 in past. Clean up the remaining few references to just
> refer it as 'driver'.
>
> This is an editorial change.
>
> Signed-off-by: Parav Pandit 

Do we plan to replace the "hypervisor" with "device" as well?

Thanks

> ---
>  content.tex | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/content.tex b/content.tex
> index 9df81b8..417d476 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -26,10 +26,10 @@ \section{\field{Device Status} Field}\label{sec:Basic 
> Facilities of a Virtio Dev
>  following bits are defined (listed below in the order in which
>  they would be typically set):
>  \begin{description}
> -\item[ACKNOWLEDGE (1)] Indicates that the guest OS has found the
> +\item[ACKNOWLEDGE (1)] Indicates that the driver has found the
>device and recognized it as a valid virtio device.
>
> -\item[DRIVER (2)] Indicates that the guest OS knows how to drive the
> +\item[DRIVER (2)] Indicates that the driver knows how to drive the
>device.
>\begin{note}
>  There could be a significant (or infinite) delay before setting
> @@ -473,13 +473,13 @@ \section{Device Initialization}\label{sec:General 
> Initialization And Device Oper
>  \begin{enumerate}
>  \item Reset the device.
>
> -\item Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> +\item Set the ACKNOWLEDGE status bit: the driver has noticed the device.
>
> -\item Set the DRIVER status bit: the guest OS knows how to drive the device.
> +\item Set the DRIVER status bit: the driver knows how to drive the device.
>
>  \item\label{itm:General Initialization And Device Operation /
>  Device Initialization / Read feature bits} Read device feature bits, and 
> write the subset of feature bits
> -   understood by the OS and driver to the device.  During this step the
> +   understood by the driver to the device.  During this step the
> driver MAY read (but MUST NOT write) the device-specific configuration 
> fields to check that it can support the device before accepting it.
>
>  \item\label{itm:General Initialization And Device Operation / Device 
> Initialization / Set FEATURES-OK} Set the FEATURES_OK status bit.  The driver 
> MUST NOT accept
> --
> 2.26.2
>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscr...@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
> List help: virtio-comment-h...@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Jason Wang
On Tue, May 16, 2023 at 11:45 AM Parav Pandit  wrote:
>
>
> > From: Jason Wang 
> > Sent: Monday, May 15, 2023 11:28 PM
> >
> > Hi Parav:
> >
> > 在 2023/5/15 23:49, Parav Pandit 写道:
> > >> From: Jason Wang 
> > >> Sent: Monday, May 15, 2023 3:11 AM
> > >>> It is just bunch of admin commands in the new SIOV or virtual virtio
> > >>> device
> > >> chapter.
> > >>> It is not a transport chapter.
> > >>
> > >> Provisioning could be a part of the transport. For example SR-IOV
> > >> allows static and dynamic provisioning. If a virtio device needs to
> > >> be implemented in VF, it needs to be provisioned first.
> > >>
> > > Provisioning a device by admin != transport guest driver <-> device
> > communication.
> > >
> > > Two very different things.
> >
> >
> > When developing a new transport, we need to consider at least how to probe
> > the device. Before probe the device, it needs to be provisioned.
> Yes, and yes to both. I agree.
>
> > If we want a new transport, it could implement the provisioning especially 
> > if it
> > want the design to be bus independent. For the proposal like transport
> > virtqueue, it's also fine to use bus specific facility to provision and 
> > probe the
> > device.
> >
> Provisioning is for defining the transport specific attributes.
> Each transport has few command things and few unique things.
> So provisioning commands can define transport specific attributes and few 
> generic things.
> This is fine.
>
> >
> > That is what I want to say.
> >
> >
> > >
> > > And the motivation is also clear is to provide composing a virtio
> > > device for the
> >  guest VM for the backward compatibility for 1.x part of the 
> >  specification.
> > > It seems fine and indeed orthogonal to me that: it is for backward
> >  compatibility for already defined config fields for existing guest VM 
> >  driver.
> > > It does not conflict with the cfgq/cmdq idea whose main purpose is
> > > for the
> >  new config fields, new use cases that doesn't require any mediation.
> > > Such cfgq works across PF, VF, SF/SIOV devices in uniform way
> > > without
> >  mediation.
> > 
> >  My understanding is that the cfgq/cmdq could be done on top, since
> >  the commands are part of transport unless I miss something.
> > 
> > >>> On top of what?
> > >>> cfgq/cmdq is just another queue like any other VQ.
> > >>> it is orthogonal to accessing 1.x registers using group owner's queue.
> > >>
> > >> I think there hasn't been any proposal that call any virtqueue like
> > >> cfgq/cmdq. So using that may cause a lot of misunderstanding. I think
> > >> the context here is to provide or extend transport facilities via a
> > >> virtqueue not something like the control virtqueue.
> > > The context is to compose a PCI device for a guest which for the currently
> > defined features.
> > > Once a new attribute is done using cfgq/cmdq there is no problem of
> > transporting it via its group member.
> > >
> > > And therefore, saying any new attribute also to ride on same tranportq/aq 
> > > via
> > is not appropriate.
> >
> >
> > The point is to try to find if there's a better way, more below.
> >
> >
> > >
> > >> Admin virtqueue doesn't preventing anything from letting IMS go
> > >> directly to the device.
> > > But there is no aq/cfgq/cmdq defined for the guest, so with current 
> > > proposal it
> > is prevented.
> >
> >
> > It's a choice of the implementation but not spec. Spec doesn't say adminq 
> > can't
> > be used for guest.
> >
> For guest and by guest are two different things.
> I am proposing that a cfgq/cmdq should be used directly by the guest driver 
> as base line like any other VQ.

I'm fine with this, but if I understand correctly, but I don't see any
proposal like this that is posted to the list?

> And if there is a need for mediation, it should be possible.
>
> > I think we've discussed many times that legacy is tricky for hardware.
> And we want to proceed for the described use case and requirements.
>
> > And your proposal proves this further by introducing modern alike
> > notification areas.
> Not really. A modern device is offering the notification area like legacy to 
> the legacy interface.
> So this is just fine.

I don't say it's not fine, it's pretty fine and transport virtqueue
have similar commands. And this also answer the question that you
think tunnel PCI over adminq is not scalable. We can keep this command
in that case as well, or invent new capabilities.

>
> > We probably need to deal with others like endianess.
> >
> No point in discussing this again. One size does not fit all.
> It solves large part of the use cases so it is valuable and will be supported.

This is doable, we need to leave some space for future development. Or
is it just complicated to have an endian command?

>
> > For any case, having a simple and deterministic device design is always
> > better assuming we've agreed that mediation is a must for legacy

[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Jason Wang



在 2023/5/11 21:15, Parav Pandit 写道:

From: Jason Wang 
Sent: Thursday, May 11, 2023 3:05 AM
If we decide to use the admin vq, is there any good reason to tie it to PCI if 
we
don't want to tunneling PCI over adminq?

To truly tunnel PCI over adminq, please tunnel _all_ pci things over AQ, not 
just below 1 to 11.



Just to clarify, below 1 to 11 is not tunnel PCI over adminq, it's the 
idea of transport virtqueue which is not specific to PCI.


If we want to tunnel PCI over adminq, it needs be accessed via your BAR 
access commands.




Doorbells, capabilities and more. Then it is qualifying as a transport.
Otherwise, it is not a "virtio pci transport over aq".



Yes, it's basically just have some extension on your proposal, adding a 
config space access method that should be fine.





I neither see why one would tunnel whole PCI over some queue, which was not 
your intention either from the patches I see.
VQ over fabrics is much cleaner design to transport VQ over non-PCI.



That's a different topic, and we don't need to think of the legacy 
support there.




People have tried to tunnel pci over some other transports and it only fine for 
non-production toys.



I've replied this in another thread.





Why not simply invent individual commands to access legacy facilities like
commands to access like what transport virtqueue did for modern
device?:


I don’t see how this is being any different than register-offset interface.
It bisects more things at hypervisor level that makes things hard to add #12th 
entry.


1) device features
2) driver features
3) queue address
4) queue size
5) queue select
6) queue notify
7) device status
8) ISR status
9) config msix
10) queue msix
11) device configuration space

It focuses on the facilities instead of transport specific details like 
registers (we
don't even need legacy registers in this case), I gives more deterministic
behavior so we don't need to care about the cross registers read/write.


1.x has these registers at raw level and that seems fine.



Note that 1.x has more, the above is dumped from the section of "Legacy 
Interfaces: A Note on PCI Device Layout".


Thanks



Anything new will come on the cfgq and hence no bisection or registers needed.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit

> From: Jason Wang 
> Sent: Monday, May 15, 2023 11:28 PM
> 
> Hi Parav:
> 
> 在 2023/5/15 23:49, Parav Pandit 写道:
> >> From: Jason Wang 
> >> Sent: Monday, May 15, 2023 3:11 AM
> >>> It is just bunch of admin commands in the new SIOV or virtual virtio
> >>> device
> >> chapter.
> >>> It is not a transport chapter.
> >>
> >> Provisioning could be a part of the transport. For example SR-IOV
> >> allows static and dynamic provisioning. If a virtio device needs to
> >> be implemented in VF, it needs to be provisioned first.
> >>
> > Provisioning a device by admin != transport guest driver <-> device
> communication.
> >
> > Two very different things.
> 
> 
> When developing a new transport, we need to consider at least how to probe
> the device. Before probe the device, it needs to be provisioned.
Yes, and yes to both. I agree.

> If we want a new transport, it could implement the provisioning especially if 
> it
> want the design to be bus independent. For the proposal like transport
> virtqueue, it's also fine to use bus specific facility to provision and probe 
> the
> device.
> 
Provisioning is for defining the transport specific attributes.
Each transport has few command things and few unique things.
So provisioning commands can define transport specific attributes and few 
generic things.
This is fine.

> 
> That is what I want to say.
> 
> 
> >
> > And the motivation is also clear is to provide composing a virtio
> > device for the
>  guest VM for the backward compatibility for 1.x part of the 
>  specification.
> > It seems fine and indeed orthogonal to me that: it is for backward
>  compatibility for already defined config fields for existing guest VM 
>  driver.
> > It does not conflict with the cfgq/cmdq idea whose main purpose is
> > for the
>  new config fields, new use cases that doesn't require any mediation.
> > Such cfgq works across PF, VF, SF/SIOV devices in uniform way
> > without
>  mediation.
> 
>  My understanding is that the cfgq/cmdq could be done on top, since
>  the commands are part of transport unless I miss something.
> 
> >>> On top of what?
> >>> cfgq/cmdq is just another queue like any other VQ.
> >>> it is orthogonal to accessing 1.x registers using group owner's queue.
> >>
> >> I think there hasn't been any proposal that call any virtqueue like
> >> cfgq/cmdq. So using that may cause a lot of misunderstanding. I think
> >> the context here is to provide or extend transport facilities via a
> >> virtqueue not something like the control virtqueue.
> > The context is to compose a PCI device for a guest which for the currently
> defined features.
> > Once a new attribute is done using cfgq/cmdq there is no problem of
> transporting it via its group member.
> >
> > And therefore, saying any new attribute also to ride on same tranportq/aq 
> > via
> is not appropriate.
> 
> 
> The point is to try to find if there's a better way, more below.
> 
> 
> >
> >> Admin virtqueue doesn't preventing anything from letting IMS go
> >> directly to the device.
> > But there is no aq/cfgq/cmdq defined for the guest, so with current 
> > proposal it
> is prevented.
> 
> 
> It's a choice of the implementation but not spec. Spec doesn't say adminq 
> can't
> be used for guest.
> 
For guest and by guest are two different things.
I am proposing that a cfgq/cmdq should be used directly by the guest driver as 
base line like any other VQ.
And if there is a need for mediation, it should be possible.

> I think we've discussed many times that legacy is tricky for hardware.
And we want to proceed for the described use case and requirements.

> And your proposal proves this further by introducing modern alike
> notification areas. 
Not really. A modern device is offering the notification area like legacy to 
the legacy interface.
So this is just fine.

> We probably need to deal with others like endianess.
>
No point in discussing this again. One size does not fit all.
It solves large part of the use cases so it is valuable and will be supported.
 
> For any case, having a simple and deterministic device design is always
> better assuming we've agreed that mediation is a must for legacy
> drivers. Using dedicated commands can make sure the implementation won't
> need to go with corner cases of legacy.
> 
Corner case handling just moved from the device to hypervisor.

> 
> >
> >>> Its same thing using more opcodes.
> >>
> >> Do we really care about the number of opcodes? I think we've reserved
> >> sufficient spaces.
> >>
> > I don’t see opcode as problem, we have plenty.
> > And hence, when there is a device that _does not implement config space for
> SIOV/SF, it should be introduced.
> > It is the introduction and bisection at the hypervisor level unnecessary.
> >
> > And I don’t have object to it either, the main reason is: I don’t see it 
> > being
> useful for 1.x access going forward.
> 
> 
> As you said for future SIOV/SF? For examp

[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Jason Wang
On Tue, May 16, 2023 at 11:37 AM Jason Wang  wrote:
>
>
> 在 2023/5/16 02:01, Michael S. Tsirkin 写道:
> > On Mon, May 15, 2023 at 06:00:02PM +, Parav Pandit wrote:
> >>> From: Michael S. Tsirkin 
> >>> Sent: Monday, May 15, 2023 1:56 PM
> >>>
> >>> On Mon, May 15, 2023 at 05:51:05PM +, Parav Pandit wrote:
> > From: Michael S. Tsirkin 
> > Sent: Monday, May 15, 2023 1:45 PM
> >
> > On Mon, May 15, 2023 at 03:49:44PM +, Parav Pandit wrote:
> >> All legacy interface via AQ.
> >> All modern interface access via PCI or its own transport between
> >> driver and
> > device.
> >
> > I am wondering however about the hypervisor notifications.
> > Generally these are the most problematic aspect here I feel.  For
> > example, how does it interact with VIRTIO_F_NOTIF_CONFIG_DATA?  And
> > generally, having both guest and host be allowed to access device's BAR
> >>> seems problematic.
> > Can we reserve a PF BAR region for these things or is that too 
> > expensive?
>  VIRTIO_F_NOTIF_CONFIG_DATA is not present for the legacy.
> >>> it is not but it just might be required, or it won't be there.
> >>>
> >> How can it be required if it not part of it?
> >> I likely didn't follow your question/comment, can you please explain.
> > VIRTIO_F_NOTIF_CONFIG_DATA is there presumably for a reason.
> >
>  For modern device, guest driver will access the device own BAR directly 
>  with
> >>> its own config_data anyway.
>  Should we reserve a region in the PF BAR for SF/SIOV device?, should 
>  device
> >>> report which BAR/area to use for the given SF/SIOV device?
>  May be yes, those are different discussions with tradeoff to consider 
>  during
> >>> SIOV discussions. It is not related to VFs.
> >>>
> >>> For SIOV it's a given. But we can do this for SRIOV: reserve a PF region 
> >>> per VF.
> >> Each VF has its own BAR area for driver notifications.
> > But it is passed through to guest presumably.
>
>
> Probably not in the case of VIRTIO_F_NOTIF_CONFIG_DATA. Did you see any
> problem if we do mediation here except for some performance penalty?

Ok, rethink of this, when using legacy, VIRTIO_F_NOTIF_CONFIG_DATA
should be assumed to be not negotiated. So I think I agree with Parav,
it should not be a problem.

Or do you mean for the devices that have some mandated features?

Thanks

>
> Thanks
>
>
> >


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Jason Wang



在 2023/5/16 02:01, Michael S. Tsirkin 写道:

On Mon, May 15, 2023 at 06:00:02PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Monday, May 15, 2023 1:56 PM

On Mon, May 15, 2023 at 05:51:05PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Monday, May 15, 2023 1:45 PM

On Mon, May 15, 2023 at 03:49:44PM +, Parav Pandit wrote:

All legacy interface via AQ.
All modern interface access via PCI or its own transport between
driver and

device.

I am wondering however about the hypervisor notifications.
Generally these are the most problematic aspect here I feel.  For
example, how does it interact with VIRTIO_F_NOTIF_CONFIG_DATA?  And
generally, having both guest and host be allowed to access device's BAR

seems problematic.

Can we reserve a PF BAR region for these things or is that too expensive?

VIRTIO_F_NOTIF_CONFIG_DATA is not present for the legacy.

it is not but it just might be required, or it won't be there.


How can it be required if it not part of it?
I likely didn't follow your question/comment, can you please explain.

VIRTIO_F_NOTIF_CONFIG_DATA is there presumably for a reason.


For modern device, guest driver will access the device own BAR directly with

its own config_data anyway.

Should we reserve a region in the PF BAR for SF/SIOV device?, should device

report which BAR/area to use for the given SF/SIOV device?

May be yes, those are different discussions with tradeoff to consider during

SIOV discussions. It is not related to VFs.

For SIOV it's a given. But we can do this for SRIOV: reserve a PF region per VF.

Each VF has its own BAR area for driver notifications.

But it is passed through to guest presumably.



Probably not in the case of VIRTIO_F_NOTIF_CONFIG_DATA. Did you see any 
problem if we do mediation here except for some performance penalty?


Thanks







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Jason Wang

Hi Parav:

在 2023/5/15 23:49, Parav Pandit 写道:

From: Jason Wang 
Sent: Monday, May 15, 2023 3:11 AM

It is just bunch of admin commands in the new SIOV or virtual virtio device

chapter.

It is not a transport chapter.


Provisioning could be a part of the transport. For example SR-IOV allows static
and dynamic provisioning. If a virtio device needs to be implemented in VF, it
needs to be provisioned first.


Provisioning a device by admin != transport guest driver <-> device 
communication.

Two very different things.



When developing a new transport, we need to consider at least how to 
probe the device. Before probe the device, it needs to be provisioned. 
If we want a new transport, it could implement the provisioning 
especially if it want the design to be bus independent. For the proposal 
like transport virtqueue, it's also fine to use bus specific facility to 
provision and probe the device.



That is what I want to say.





And the motivation is also clear is to provide composing a virtio
device for the

guest VM for the backward compatibility for 1.x part of the specification.

It seems fine and indeed orthogonal to me that: it is for backward

compatibility for already defined config fields for existing guest VM driver.

It does not conflict with the cfgq/cmdq idea whose main purpose is
for the

new config fields, new use cases that doesn't require any mediation.

Such cfgq works across PF, VF, SF/SIOV devices in uniform way
without

mediation.

My understanding is that the cfgq/cmdq could be done on top, since
the commands are part of transport unless I miss something.


On top of what?
cfgq/cmdq is just another queue like any other VQ.
it is orthogonal to accessing 1.x registers using group owner's queue.


I think there hasn't been any proposal that call any virtqueue like cfgq/cmdq. 
So
using that may cause a lot of misunderstanding. I think the context here is to
provide or extend transport facilities via a virtqueue not something like the
control virtqueue.

The context is to compose a PCI device for a guest which for the currently 
defined features.
Once a new attribute is done using cfgq/cmdq there is no problem of 
transporting it via its group member.

And therefore, saying any new attribute also to ride on same tranportq/aq via 
is not appropriate.



The point is to try to find if there's a better way, more below.





Admin virtqueue doesn't preventing anything from letting IMS go directly to the
device.

But there is no aq/cfgq/cmdq defined for the guest, so with current proposal it 
is prevented.



It's a choice of the implementation but not spec. Spec doesn't say 
adminq can't be used for guest.








   Long discussion with Thomas on IMS topic.
I will avoid diverting to unrelated discussion for now.


Or proposed command in [1] should be tagged as siov_r1, then things will

be

cleaner.

Maybe we can say, one of the implementations is siov_r1, since it can be

used

to implement devices in other transport. One of the main motivations for
transport virtqueue is to reduce the transport specific resources to ease the
virtio device implementation.


Yes, the main motivation as for backward compatibility for the currently

defined features.

With that I don't see legacy 3 commands anyway conflict with [1].

It doesn't, but it's not elegant since:


I don’t see how making 3 commands to 9 commands makes it elegant by

doing bisecting of registers offset in hypervisor.


Or it needs to be done by the hardware and cross register write/read
needs to be handled there as well.


That is the limitation of legacy and device can always _decide_ when to apply 
the parameter later when enough bits are written.
Given its control path, it is not an overhead.



I think we've discussed many times that legacy is tricky for hardware. 
And your proposal proves this further by introducing modern alike 
notification areas. We probably need to deal with others like endianess.


For any case, having a simple and deterministic device design is always 
better assuming we've agreed that mediation is a must for legacy 
drivers. Using dedicated commands can make sure the implementation won't 
need to go with corner cases of legacy.






Its same thing using more opcodes.


Do we really care about the number of opcodes? I think we've reserved
sufficient spaces.


I don’t see opcode as problem, we have plenty.
And hence, when there is a device that _does not implement config space for 
SIOV/SF, it should be introduced.
It is the introduction and bisection at the hypervisor level unnecessary.

And I don’t have object to it either, the main reason is: I don’t see it being 
useful for 1.x access going forward.



As you said for future SIOV/SF? For example, it can have better 
scalability in IMS since it doesn't require any real capabilities or 
registers.






1) Modern transport, admin virtqueue with well defined transport

commands

2) Legacy transport, works moe like a PCI transport on top of the adm

[virtio-dev] [PATCH] content: Replace guest OS with driver

2023-05-15 Thread Parav Pandit
Currently device status field description and driver requirements
section uses mix of terminology for the driver. These two sections
sometimes call the driver as 'the guest OS' or 'the driver'.

Most of the cleanup around 'guest Os' was already done around commit
212c0cf3 in past. Clean up the remaining few references to just
refer it as 'driver'.

This is an editorial change.

Signed-off-by: Parav Pandit 
---
 content.tex | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/content.tex b/content.tex
index 9df81b8..417d476 100644
--- a/content.tex
+++ b/content.tex
@@ -26,10 +26,10 @@ \section{\field{Device Status} Field}\label{sec:Basic 
Facilities of a Virtio Dev
 following bits are defined (listed below in the order in which
 they would be typically set):
 \begin{description}
-\item[ACKNOWLEDGE (1)] Indicates that the guest OS has found the
+\item[ACKNOWLEDGE (1)] Indicates that the driver has found the
   device and recognized it as a valid virtio device.
 
-\item[DRIVER (2)] Indicates that the guest OS knows how to drive the
+\item[DRIVER (2)] Indicates that the driver knows how to drive the
   device.
   \begin{note}
 There could be a significant (or infinite) delay before setting
@@ -473,13 +473,13 @@ \section{Device Initialization}\label{sec:General 
Initialization And Device Oper
 \begin{enumerate}
 \item Reset the device.
 
-\item Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
+\item Set the ACKNOWLEDGE status bit: the driver has noticed the device.
 
-\item Set the DRIVER status bit: the guest OS knows how to drive the device.
+\item Set the DRIVER status bit: the driver knows how to drive the device.
 
 \item\label{itm:General Initialization And Device Operation /
 Device Initialization / Read feature bits} Read device feature bits, and write 
the subset of feature bits
-   understood by the OS and driver to the device.  During this step the
+   understood by the driver to the device.  During this step the
driver MAY read (but MUST NOT write) the device-specific configuration 
fields to check that it can support the device before accepting it.
 
 \item\label{itm:General Initialization And Device Operation / Device 
Initialization / Set FEATURES-OK} Set the FEATURES_OK status bit.  The driver 
MUST NOT accept
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [virtio-comment] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Monday, May 15, 2023 4:30 PM
> >
> > I am not sure if this is a real issue. Because even the legacy guests
> > have msix enabled by default. In theory yes, it can fall back to intx.
> 
> Well. I feel we should be closer to being sure it's not an issue if we are 
> going to
> ignore it.
> some actual data here:
> 
> Even linux only enabled MSI-X in 2009.
> Of course, other guests took longer. E.g.
> a quick google search gave me this for some bsd variant (2017):
> https://twitter.com/dragonflybsd/status/834494984229421057
> 
> Many guests have tunables to disable msix. Why?
> E.g. BSD keeps maintaining it at
>  hw.virtio.pci.disable_msix
> not a real use-case and you know 100% no guests have set this to work around
> some bug e.g. in bsd MSI-X core?  How can you be sure?
> 
> 
> 
> intx is used when guests run out of legacy interrupts, these setups are not 
> hard
> to create at all: just constrain the number of vCPUs while creating lots of
> devices.
> 
> 
> I could go on.
> 
> 
> 
> > There are few options.
> > 1. A hypervisor driver can be conservative and steal an msix of the VF
> > for transporting intx.
> > Pros: Does not need special things in device
> > Cons:
> > a. Fairly intrusive in hypervisor vf driver.
> > b. May not be ever used as guest is unlikely to fail on msix
> 
> Yea I do not like this since we are burning up msix vectors.
> More reasons: this "pass through" msix has no chance to set ISR properly since
> msix does not set ISR.
> 
> 
> > 2. Since multiple VFs intx to be serviced, one command per VF in AQ is
> > too much overhead that device needs to map a request to,
> >
> > A better way is to have an eventq of depth = num_vfs, like many other
> > virtio devices have it.
> >
> > An eventq can hold per VF interrupt entry including the isr value that
> > you suggest above.
> >
> > Something like,
> >
> > union eventq_entry {
> > u8 raw_data[16];
> > struct intx_entry {
> > u8 event_opcode;
> > u8 group_type;
> > u8 reserved[6];
> > le64 group_identifier;
> > u8 isr_status;
> > };
> > };
> >
> > This eventq resides on the owner parent PF.
> > isr_status is read on clear like today.
> 
> This is what I wrote no?
> lore.kernel.org/all/20230507050146-mutt-send-email-
> mst%40kernel.org/t.mbox.gz
> 
>   how about a special command that is used when device would
>   normally send INT#x? it can also return ISR to reduce latency.
> 
In response to your above suggestion of AQ command,
I suggested the eventq that contains the isr_status that reduces latency as you 
suggest.

> > May be such eventq can be useful in future for wider case.
> 
> There's no maybe here is there? Things like live migration need events for 
> sure.
> 
> > We may have to find a different name for it as other devices has
> > device specific eventq.
> 
> We don't need a special name for it. Just use an adminq with a special
> command that is only consumed when there is an event.
This requires too many commands to be issued on the PF device.
Potentially one per VF. And device needs to keep track of command to VF mapping.

> Note you only need to queue a command if MSI is disabled.
> Which is nice.
Yes, it is nice.
An eventq is a variation of it, where device can keep reporting the events 
without doing the extra mapping and without too many commands.

Additionally, eventq also works for 1.x device which will read the ISR status 
registers directly from the device.

> 
> > I am inclined to differ this to a later point if one can identify the
> > real failure with msix for the guest VM.
> > So far we don't see this ever happening.
> 
> What is the question exactly?
> 
> Just have more devices than vectors,
> an intel CPU only has ~200 of these, and current drivers want to use 2 vectors
> and then fall back on INTx since that is shared.
> Extremely easy to create - do you want a qemu command line to try?
> 
Intel CPU has 256 per core (per vcpu). So they are really a lot.
One needs to connect lot more devices to the cpu to run out of it.
So yes, I would like to try the command to fail.

> Do specific customers event use guests with msi-x disabled? Maybe no.
> Does anyone use virtio with msi-x disabled? Most likely yes.
I just feel that INTx emulation is extremely rare/narrow case of some 
applications that may never find its use on hw based devices.

> So if we are going for legacy pci emulation let's have a comprehensive legacy
> pci emulation please where host can either enable it for a guest or deny
> completely, not kind of start running then fail mysteriously.
A driver will be easily able to fail the call on INTx configuration failing the 
guest.

But lets see if can align to eventq/aq scheme to make it work.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-

[virtio-dev] [RFC PATCH 1/1] virtio-balloon: Add Working Set Reporting feature

2023-05-15 Thread T.J. Alumbaugh
Adds VIRTIO_F_WS_REPORTING feature bit.
Adds additional virtqueues and device operation details.

Signed-off-by: T.J. Alumbaugh 
---
 device-types/balloon/description.tex | 228 ++-
 1 file changed, 227 insertions(+), 1 deletion(-)

diff --git a/device-types/balloon/description.tex
b/device-types/balloon/description.tex
index a1d9603..4ea764f 100644
--- a/device-types/balloon/description.tex
+++ b/device-types/balloon/description.tex
@@ -22,6 +22,8 @@ \subsection{Virtqueues}\label{sec:Device Types /
Memory Balloon Device / Virtque
 \item[2] statsq
 \item[3] free_page_vq
 \item[4] reporting_vq
+\item[5] ws_vq
+\item[6] notification_vq
 \end{description}

   statsq only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
@@ -30,6 +32,8 @@ \subsection{Virtqueues}\label{sec:Device Types /
Memory Balloon Device / Virtque

   reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.

+  s_vq and notification_vq only exist if VIRTIO_BALLOON_F_WS_REPORTING is set.
+
 \subsection{Feature bits}\label{sec:Device Types / Memory Balloon
Device / Feature bits}
 \begin{description}
 \item[VIRTIO_BALLOON_F_MUST_TELL_HOST (0)] Host has to be told before
@@ -48,7 +52,9 @@ \subsection{Feature bits}\label{sec:Device Types /
Memory Balloon Device / Featu
 Configuration field \field{poison_val} is valid.
 \item[ VIRTIO_BALLOON_F_PAGE_REPORTING(5) ] The device has support for free
 page reporting. A virtqueue for reporting free guest memory is present.
-
+\item[ VIRTIO_BALLOON_F_WS_REPORTING(6) ] The device has support for
Working Set
+(WS) reporting. A virtqueue for reporting WS histograms is
present (ws_vq) and
+a virtqueue to receive WS-related notifications (notification_vq)
is present.
 \end{description}

 \drivernormative{\subsubsection}{Feature bits}{Device Types / Memory
Balloon Device / Feature bits}
@@ -86,6 +92,8 @@ \subsection{Device configuration
layout}\label{sec:Device Types / Memory Balloon
 read-only by the driver.
   \field{poison_val} is available if VIRTIO_BALLOON_F_PAGE_POISON has been
 negotiated.
+  \field{ws_num_bins} is available if VIRTIO_BALLOON_F_WS_REPORTING has been
+negotiated.

 \begin{lstlisting}
 struct virtio_balloon_config {
@@ -93,6 +101,7 @@ \subsection{Device configuration
layout}\label{sec:Device Types / Memory Balloon
 le32 actual;
 le32 free_page_hint_cmd_id;
 le32 poison_val;
+le32 ws_num_bins;
 };
 \end{lstlisting}

@@ -632,3 +641,220 @@ \subsubsection{Free Page
Reporting}\label{sec:Device Types / Memory Balloon Devi
 If the VIRTIO_BALLOON_F_PAGE_POISON feature has been negotiated, the device
 MUST NOT modify the the content of a reported page to a value other than
 \field{poison_val}.
+
+\subsubsection{Working Set Reporting}\label{sec:Device Types / Memory
Balloon Device / Device Operation / Working Set Reporting}
+
+A Working Set ("WS") measures what memory a computer system has recently
+used (where "recently" is application specific). In most practical systems,
+memory is viewed at the granularity of a page. An ideal system would check
+the access time for every page after every instruction, but this is not
+practical. In a realistic scenario, the idle age of a page can be defined as:
+
+\begin{lstlisting}
+idle_age = current_system_time - time_access_bit_was_cleared
+\end{lstlisting}
+
+\field{time_access_bit_was_cleared} is a proxy for "time of last access."
+Checking (and clearing) the "accessed" bit on a page table entry is a typical
+task in operating systems, running from time to time in memory management
+activities. In this scheme, accuracy is sacrificed for improved performance
+(since less time overall is spent on scanning the memory).
+
+The Working Set consists of "bins" of pages of similar estimated idle age.
+Collecting idle ages for large sets of pages means finding convenient and
+efficient times to check the accessed bits. For all these pages, we associate
+some time \field{t} with the set, and logically consider them as "accessed
+no later than time t."
+
+The collection of "binned" sets of pages is best described as a histogram,
+where each bin has an associated idle age and all pages in the bin have been
+idle for no longer than that age.
+
+\paragraph{Memory Types: Working Set Reporting}\label{sec:Device
Types / Memory Balloon Device / Device Operation / Working Set
Reporting / Memory Types: Working Set Reporting}
+
+Each bin can describe more than one type of memory, reflecting the different
+types of pages tracked by an operating system. Memory types are enumerated
+in the \field{virtio_balloon_ws_memory_type} enum. To guarantee backwards
+compatibility, devices are free to ignore unrecognized WS memory type values.
+
+\begin{lstlisting}
+enum virtio_balloon_ws_memory_type {
+  VIRTIO_BALLOON_WS_ANON
+ VIRTIO_BALLOON_WS_FILE
+ };
+\end{lstlisting}
+
+The supported memory types are as follows:
+
+\begin{description}
+\item[ANON] Memory that is not backed by files.
+
+\item

[virtio-dev] [RFC PATCH 0/1] virtio-balloon: Add Working Set Reporting feature

2023-05-15 Thread T.J. Alumbaugh
This is a proposed spec expansion for a Working Set Reporting feature
in the balloon with driver patch here:

https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuan...@google.com/

with device implementation here:

https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg02503.html

It describes the requirements for a VIRTIO_F_WS_REPORTING feature bit
on the balloon device.

Motivation
==
When we have a system with overcommitted memory and 1 or more VMs, we
seek to get both timely and accurate information on overall memory
utilization in order to drive appropriate reclaim activities. For
example, in some client device use cases a VM might need a significant
fraction of the overall memory for a period of time, but then enter a
quiet period that results in a large number of cold pages in the guest.

The balloon device has a number of features to assist in sharing memory
resources amongst the guests and host (e.g free page hinting, stats, free page
reporting). As mentioned in slide 12 in [1], the balloon doesn't have a good
mechanism to drive the reclaim of guest cache. Our use case includes both
typical page cache as well as "application caches" with memory that should be
discarded in times of system-wide memory pressure.

Working Set Reporting
=

Working Set reporting in the balloon provides:

 - an accurate picture of current memory utilization in the guest
 - event driven reporting (with configurable rate limiting) to deliver reports
   during times of memory pressure.

The reporting mechanism can be combined with a domain-specific balloon policy
to drive the separate reclaim activities in a coordinated fashion.

TODOs:
==

 - There are some small differences between this spec and the
   implementation in the data exchange protocol in the device. We wanted to
   get feedback on this diff at an early stage though, rather than get every
   piece nailed down with precision.

References:

[1] 
https://kvmforum2020.sched.com/event/eE4U/virtio-balloonpmemmem-managing-guest-memory-david-hildenbrand-michael-s-tsirkin-red-hat

T.J. Alumbaugh (1):
  virtio-balloon: Add Working Set Reporting feature

 device-types/balloon/description.tex | 228 ++-
 1 file changed, 227 insertions(+), 1 deletion(-)

-- 
2.40.1.606.ga4b1b128d6-goog

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Mon, May 08, 2023 at 12:54:55PM -0400, Parav Pandit wrote:
> 
> On 5/7/2023 5:04 AM, Michael S. Tsirkin wrote:
> > 
> > one things I still don't see addressed here is support for
> > legacy interrupts. legacy driver can disable msix and
> > interrupts will be then sent.
> > how about a special command that is used when device would
> > normally send INT#x? it can also return ISR to reduce latency.
> 
> I am not sure if this is a real issue. Because even the legacy guests have
> msix enabled by default. In theory yes, it can fall back to intx.

Well. I feel we should be closer to being sure it's not
an issue if we are going to ignore it.
some actual data here:

Even linux only enabled MSI-X in 2009.
Of course, other guests took longer. E.g.
a quick google search gave me this for some bsd variant (2017):
https://twitter.com/dragonflybsd/status/834494984229421057

Many guests have tunables to disable msix. Why?
E.g. BSD keeps maintaining it at
 hw.virtio.pci.disable_msix
not a real use-case and you know 100% no guests have set this
to work around some bug e.g. in bsd MSI-X core?  How can you be sure?



intx is used when guests run out of legacy interrupts,
these setups are not hard to create at all: just constrain the
number of vCPUs while creating lots of devices.


I could go on.



> There are few options.
> 1. A hypervisor driver can be conservative and steal an msix of the VF for
> transporting intx.
> Pros: Does not need special things in device
> Cons:
> a. Fairly intrusive in hypervisor vf driver.
> b. May not be ever used as guest is unlikely to fail on msix

Yea I do not like this since we are burning up msix vectors.
More reasons: this "pass through" msix has no chance to
set ISR properly since msix does not set ISR.


> 2. Since multiple VFs intx to be serviced, one command per VF in AQ is too
> much overhead that device needs to map a request to,
> 
> A better way is to have an eventq of depth = num_vfs, like many other virtio
> devices have it.
> 
> An eventq can hold per VF interrupt entry including the isr value that you
> suggest above.
> 
> Something like,
> 
> union eventq_entry {
>   u8 raw_data[16];
>   struct intx_entry {
>   u8 event_opcode;
>   u8 group_type;
>   u8 reserved[6];
>   le64 group_identifier;
>   u8 isr_status;
>   };
> };
> 
> This eventq resides on the owner parent PF.
> isr_status is read on clear like today.

This is what I wrote no?
lore.kernel.org/all/20230507050146-mutt-send-email-mst%40kernel.org/t.mbox.gz

how about a special command that is used when device would
normally send INT#x? it can also return ISR to reduce latency.



> May be such eventq can be useful in future for wider case.

There's no maybe here is there? Things like live migration
need events for sure.

> We may have to find a different name for it as other devices has device
> specific eventq.

We don't need a special name for it. Just use an adminq
with a special command that is only consumed when there is an event.
Note you only need to queue a command if MSI is disabled.
Which is nice.




> I am inclined to differ this to a later point if one can identify the real
> failure with msix for the guest VM.
> So far we don't see this ever happening.

What is the question exactly?

Just have more devices than vectors,
an intel CPU only has ~200 of these, and current drivers
want to use 2 vectors and then fall back on INTx since that is shared.
Extremely easy to create - do you want a qemu command line to try?

Do specific customers event use guests with msi-x disabled? Maybe no.
Does anyone use virtio with msi-x disabled? Most likely yes.
So if we are going for legacy pci emulation let's have
a comprehensive legacy pci emulation please where
host can either enable it for a guest or deny completely,
not kind of start running then fail mysteriously.


-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit


> From: Michael S. Tsirkin 
> Sent: Monday, May 15, 2023 2:02 PM
> > >
> > How can it be required if it not part of it?
> > I likely didn't follow your question/comment, can you please explain.
> 
> VIRTIO_F_NOTIF_CONFIG_DATA is there presumably for a reason.
The discussion is for the legacy interface for which CONFIG_DATA doesn't exist.

> 
> > > > For modern device, guest driver will access the device own BAR
> > > > directly with
> > > its own config_data anyway.
> > > > Should we reserve a region in the PF BAR for SF/SIOV device?,
> > > > should device
> > > report which BAR/area to use for the given SF/SIOV device?
> > > > May be yes, those are different discussions with tradeoff to
> > > > consider during
> > > SIOV discussions. It is not related to VFs.
> > >
> > > For SIOV it's a given. But we can do this for SRIOV: reserve a PF region 
> > > per
> VF.
> > Each VF has its own BAR area for driver notifications.
> 
> But it is passed through to guest presumably.
Yes, so?

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Mon, May 15, 2023 at 06:00:02PM +, Parav Pandit wrote:
> > From: Michael S. Tsirkin 
> > Sent: Monday, May 15, 2023 1:56 PM
> > 
> > On Mon, May 15, 2023 at 05:51:05PM +, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin 
> > > > Sent: Monday, May 15, 2023 1:45 PM
> > > >
> > > > On Mon, May 15, 2023 at 03:49:44PM +, Parav Pandit wrote:
> > > > > All legacy interface via AQ.
> > > > > All modern interface access via PCI or its own transport between
> > > > > driver and
> > > > device.
> > > >
> > > > I am wondering however about the hypervisor notifications.
> > > > Generally these are the most problematic aspect here I feel.  For
> > > > example, how does it interact with VIRTIO_F_NOTIF_CONFIG_DATA?  And
> > > > generally, having both guest and host be allowed to access device's BAR
> > seems problematic.
> > > > Can we reserve a PF BAR region for these things or is that too 
> > > > expensive?
> > >
> > > VIRTIO_F_NOTIF_CONFIG_DATA is not present for the legacy.
> > 
> > it is not but it just might be required, or it won't be there.
> > 
> How can it be required if it not part of it?
> I likely didn't follow your question/comment, can you please explain.

VIRTIO_F_NOTIF_CONFIG_DATA is there presumably for a reason.

> > > For modern device, guest driver will access the device own BAR directly 
> > > with
> > its own config_data anyway.
> > > Should we reserve a region in the PF BAR for SF/SIOV device?, should 
> > > device
> > report which BAR/area to use for the given SF/SIOV device?
> > > May be yes, those are different discussions with tradeoff to consider 
> > > during
> > SIOV discussions. It is not related to VFs.
> > 
> > For SIOV it's a given. But we can do this for SRIOV: reserve a PF region 
> > per VF.
> Each VF has its own BAR area for driver notifications.

But it is passed through to guest presumably.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit
> From: Michael S. Tsirkin 
> Sent: Monday, May 15, 2023 1:56 PM
> 
> On Mon, May 15, 2023 at 05:51:05PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Monday, May 15, 2023 1:45 PM
> > >
> > > On Mon, May 15, 2023 at 03:49:44PM +, Parav Pandit wrote:
> > > > All legacy interface via AQ.
> > > > All modern interface access via PCI or its own transport between
> > > > driver and
> > > device.
> > >
> > > I am wondering however about the hypervisor notifications.
> > > Generally these are the most problematic aspect here I feel.  For
> > > example, how does it interact with VIRTIO_F_NOTIF_CONFIG_DATA?  And
> > > generally, having both guest and host be allowed to access device's BAR
> seems problematic.
> > > Can we reserve a PF BAR region for these things or is that too expensive?
> >
> > VIRTIO_F_NOTIF_CONFIG_DATA is not present for the legacy.
> 
> it is not but it just might be required, or it won't be there.
> 
How can it be required if it not part of it?
I likely didn't follow your question/comment, can you please explain.

> > For modern device, guest driver will access the device own BAR directly with
> its own config_data anyway.
> > Should we reserve a region in the PF BAR for SF/SIOV device?, should device
> report which BAR/area to use for the given SF/SIOV device?
> > May be yes, those are different discussions with tradeoff to consider during
> SIOV discussions. It is not related to VFs.
> 
> For SIOV it's a given. But we can do this for SRIOV: reserve a PF region per 
> VF.
Each VF has its own BAR area for driver notifications.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Mon, May 15, 2023 at 05:51:05PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, May 15, 2023 1:45 PM
> > 
> > On Mon, May 15, 2023 at 03:49:44PM +, Parav Pandit wrote:
> > > All legacy interface via AQ.
> > > All modern interface access via PCI or its own transport between driver 
> > > and
> > device.
> > 
> > I am wondering however about the hypervisor notifications.  Generally these
> > are the most problematic aspect here I feel.  For example, how does it 
> > interact
> > with VIRTIO_F_NOTIF_CONFIG_DATA?  And generally, having both guest and
> > host be allowed to access device's BAR seems problematic.
> > Can we reserve a PF BAR region for these things or is that too expensive?
> 
> VIRTIO_F_NOTIF_CONFIG_DATA is not present for the legacy.

it is not but it just might be required, or it won't be there.

> For modern device, guest driver will access the device own BAR directly with 
> its own config_data anyway.
> Should we reserve a region in the PF BAR for SF/SIOV device?, should device 
> report which BAR/area to use for the given SF/SIOV device?
> May be yes, those are different discussions with tradeoff to consider during 
> SIOV discussions. It is not related to VFs.

For SIOV it's a given. But we can do this for SRIOV: reserve a PF region
per VF.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit


> From: Michael S. Tsirkin 
> Sent: Monday, May 15, 2023 1:45 PM
> 
> On Mon, May 15, 2023 at 03:49:44PM +, Parav Pandit wrote:
> > All legacy interface via AQ.
> > All modern interface access via PCI or its own transport between driver and
> device.
> 
> I am wondering however about the hypervisor notifications.  Generally these
> are the most problematic aspect here I feel.  For example, how does it 
> interact
> with VIRTIO_F_NOTIF_CONFIG_DATA?  And generally, having both guest and
> host be allowed to access device's BAR seems problematic.
> Can we reserve a PF BAR region for these things or is that too expensive?

VIRTIO_F_NOTIF_CONFIG_DATA is not present for the legacy.
For modern device, guest driver will access the device own BAR directly with 
its own config_data anyway.
Should we reserve a region in the PF BAR for SF/SIOV device?, should device 
report which BAR/area to use for the given SF/SIOV device?
May be yes, those are different discussions with tradeoff to consider during 
SIOV discussions. It is not related to VFs.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Mon, May 15, 2023 at 03:49:44PM +, Parav Pandit wrote:
> All legacy interface via AQ.
> All modern interface access via PCI or its own transport between driver and 
> device.

I am wondering however about the hypervisor notifications.  Generally
these are the most problematic aspect here I feel.  For example, how
does it interact with VIRTIO_F_NOTIF_CONFIG_DATA?  And generally, having
both guest and host be allowed to access device's BAR seems problematic.
Can we reserve a PF BAR region for these things or is that too
expensive?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit
Hi Michael,

> From: virtio-comm...@lists.oasis-open.org  open.org> On Behalf Of Parav Pandit
> Sent: Thursday, May 11, 2023 5:03 PM
> 
> > From: Michael S. Tsirkin 
> > Sent: Thursday, May 11, 2023 4:58 PM
> >
> > On Thu, May 11, 2023 at 08:47:04PM +, Parav Pandit wrote:
> > > > From: Michael S. Tsirkin 
> > > > Sent: Thursday, May 11, 2023 9:39 AM
> > > >
> > > > On Thu, May 11, 2023 at 01:28:40PM +, Parav Pandit wrote:
> > > > > This is not the objective to transport all PCI virtio fields over AQ.
> > > > > Transport VQ commands proposal commands and commands in this
> > > > > proposal
> > > > are just fine the way there without quoting it as "transport".
> > > >
> > > > I mean, it's a transport in a sense of where it will reside in e.g.
> > > > linux virtio stack - below virtio core.
> > > > We don't save much code there by reusing the same register layout
> > > > of legacy pci.
> > > > But let's at least make this feature complete, and then we'll see
> > > > whether it is cleaner to reuse legacy layout or build a new one.
> > >
> > > I don't follow your comment about "this feature complete".
> > > When you say _this_, it means the feature proposed in this patch of
> > supporting legacy over PCI VFs?
> > >
> > > If so, anything missing in this proposal?
> >
> > I think I commented at least on what I see as missing is first of all
> > a need to support INT#x emulation since level interrupts are I think not
> supported for VFs.
> > right?
> 
> I don't see it mandatory as I responded in [1].
> I didn't see your response on [1].
> 
> [1] https://lore.kernel.org/virtio-comment/71d65eb3-c025-9287-0157-
> 81e1d0557...@nvidia.com/

Do you have any further suggestion, or we can differ the eventq for future 
when/if there is any real problem?

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit

> From: Jason Wang 
> Sent: Monday, May 15, 2023 3:31 AM

> > What do we gain from bisecting it?
> 
> 
> 1) No need to deal with the offset in the hardware
> 
In context of legacy is doesn’t matter much given that its over AQ.

> 2) transport independent
> 
AQ commands are transport independent so, it is covered.

> 3) avoid duplication with the (legacy support of) transport virtqueue
> 
Modern interface does not need mediation; hence it is not duplication with 
legacy.

> 
> > Every new additional needs involvement of the hypervisor as Michael noted
> below on "effort" point.
> 
> 
> I'm not sure I get your point.
> 
> If additional effort is not wanted, we need a full PCI over adminq,
> that's what I suggested, then we don't need any extension for any new
> features probably.
> 
When there is already a PCI VF, I don’t see why one would implement "PCI over 
adminq".
It is impractical.
And SF/SIOV is bit still far in future.

When "PCI over adminq" happen, it is really a very different set of commands 
that emulated the "PCI virtio" device.
I don’t know when/why.

> If we choose to invent dedicated adminq commands:
> 
> For legacy, we don't need any new additional effort since the above 11
> maps to the legacy virtio-pci facilities.
> 
Only few commands are needed.
Modern interface config doesn’t travel through mediation.

> For modern, if the feature is something related to the transport, we
> need new transport interface for sure.
> 
Everything is transport as it flows to/from driver to device covered in the 
respective transport chapter. 😊

> 
> >
> > The register offset and read/write is far simpler interface for hypervisor.
> 
> 
> If this is true, why not go full PCI over adminq? Hypervisor in this
> case even don't need to deal with config space.
> 
I don’t see a use case now and its extremely heavy to do so and it doesn’t 
apply to PCI VFs at all.

> 
> >
> > Not to do cross register access is what we need to define in the spec for 
> > 1.x or
> future things.
> 
> 
> You can't assume the hypervisor behaviors, so the device or at least the
> spec should be ready for that.
>
My above comment is for the driver to device interface to access registers on 
its natural boundary.
Which part of the proposal assumes a hypervisor behavior?
Nothing to do with hypervisor.


[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit

> From: Jason Wang 
> Sent: Monday, May 15, 2023 3:11 AM

> > It is just bunch of admin commands in the new SIOV or virtual virtio device
> chapter.
> > It is not a transport chapter.
> 
> 
> Provisioning could be a part of the transport. For example SR-IOV allows 
> static
> and dynamic provisioning. If a virtio device needs to be implemented in VF, it
> needs to be provisioned first.
> 
Provisioning a device by admin != transport guest driver <-> device 
communication.

Two very different things.

> 
> >
> >>> And the motivation is also clear is to provide composing a virtio
> >>> device for the
> >> guest VM for the backward compatibility for 1.x part of the specification.
> >>> It seems fine and indeed orthogonal to me that: it is for backward
> >> compatibility for already defined config fields for existing guest VM 
> >> driver.
> >>> It does not conflict with the cfgq/cmdq idea whose main purpose is
> >>> for the
> >> new config fields, new use cases that doesn't require any mediation.
> >>> Such cfgq works across PF, VF, SF/SIOV devices in uniform way
> >>> without
> >> mediation.
> >>
> >> My understanding is that the cfgq/cmdq could be done on top, since
> >> the commands are part of transport unless I miss something.
> >>
> > On top of what?
> > cfgq/cmdq is just another queue like any other VQ.
> > it is orthogonal to accessing 1.x registers using group owner's queue.
> 
> 
> I think there hasn't been any proposal that call any virtqueue like 
> cfgq/cmdq. So
> using that may cause a lot of misunderstanding. I think the context here is to
> provide or extend transport facilities via a virtqueue not something like the
> control virtqueue.
The context is to compose a PCI device for a guest which for the currently 
defined features.
Once a new attribute is done using cfgq/cmdq there is no problem of 
transporting it via its group member.

And therefore, saying any new attribute also to ride on same tranportq/aq via 
is not appropriate.

> 
> Admin virtqueue doesn't preventing anything from letting IMS go directly to 
> the
> device.
But there is no aq/cfgq/cmdq defined for the guest, so with current proposal it 
is prevented.

> 
> 
> >   Long discussion with Thomas on IMS topic.
> > I will avoid diverting to unrelated discussion for now.
> >
> >>> Or proposed command in [1] should be tagged as siov_r1, then things will
> be
> >> cleaner.
> >>
> >> Maybe we can say, one of the implementations is siov_r1, since it can be
> used
> >> to implement devices in other transport. One of the main motivations for
> >> transport virtqueue is to reduce the transport specific resources to ease 
> >> the
> >> virtio device implementation.
> >>
> > Yes, the main motivation as for backward compatibility for the currently
> defined features.
> >
> >>> With that I don't see legacy 3 commands anyway conflict with [1].
> >> It doesn't, but it's not elegant since:
> >>
> > I don’t see how making 3 commands to 9 commands makes it elegant by
> doing bisecting of registers offset in hypervisor.
> 
> 
> Or it needs to be done by the hardware and cross register write/read
> needs to be handled there as well.
> 
That is the limitation of legacy and device can always _decide_ when to apply 
the parameter later when enough bits are written.
Given its control path, it is not an overhead.

> 
> > Its same thing using more opcodes.
> 
> 
> Do we really care about the number of opcodes? I think we've reserved
> sufficient spaces.
> 
I don’t see opcode as problem, we have plenty.
And hence, when there is a device that _does not implement config space for 
SIOV/SF, it should be introduced.
It is the introduction and bisection at the hypervisor level unnecessary.

And I don’t have object to it either, the main reason is: I don’t see it being 
useful for 1.x access going forward.

> 
> >
> >> 1) Modern transport, admin virtqueue with well defined transport
> commands
> >> 2) Legacy transport, works moe like a PCI transport on top of the admin
> >> virtqueue
> >>
> >> Transport virtqueue tend to be transport independent, but 2) tie it somehow
> to
> >> the PCI. That's why I suggest to
> >>
> > The 4 patches are not transporting all the PCI things over transport VQ. it 
> > is
> not a transport.
> 
> 
> It really depends on the different viewpoint. 

You wrote/asked for "PCI transport over admin q".
That means PCI capabilities, PCI config registers and all things defined in the 
PCI transport section to go over the AQ.
This is clearly not the objective of these 2 patches.
Nor it is your proposal of transport VQ.

You only want to exchange currently defined config registers, interrupts and 
notification configuration using AQ.

Transport VQ is NOT transporting actual data, it is not transporting 
notifications etc.
It is just a config channel.
Hence, they are just commands not a "full transport".

> To me, adminq is not
> specific to PCI, so this proposal looks more like a partial transport
> for legacy.
It is not a partial transport f

[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit

> From: Jason Wang 
> Sent: Monday, May 15, 2023 1:20 AM

> 
> Right, but as discussed, a possible user is the SIOV transitional device via
> transport virtqueues.
> 
> 
> > transport vq might, but I am worried we'll have to spend extra
> > work clarifying it's a legacy interface. For example we will
> > have to explain that only 32 bits of features must be used.
> > Focusing more effort on transport vq is attractive though ...
> > It is worth examining this option in more depth.
> > Did you?
> 
> 
> My understanding is, for any case, we need such clarifications. What
> Parav proposed here is not the access to the actual registers but stick
> to the semantic of those registers. I think we must say high 32bits of
> the features must be assumed to be 0.
> 
I am avoided duplicating anything that is already written for the PCI interface.
In legacy interface section of PCI, we already have:

"Note that only Feature Bits 0 to 31 are accessible through the Legacy 
Interface".
If MMIO is missing such line in existing interface, it is better to be added 
there.
It is not scope of this patch.


[virtio-dev] RE: [virtio-comment] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit

> From: Jason Wang 
> Sent: Monday, May 15, 2023 1:12 AM
> 
> 在 2023/5/11 22:31, Parav Pandit 写道:
> >> From: Jason Wang 
> >> Sent: Thursday, May 11, 2023 3:17 AM
> >>> a configq/cmdq discussion is for new features for PF, VF, and
> >>> SF/SIOV devices to work in _non_ backward compatible way, because
> >>> for new features there is no such thing as backward compatibility
> >>> between guest driver and the device.
> >> Ok, so what command did configq/cmdq have?
> >> I thought it was similar to transport virtqueue, but it looks more like a
> control virtqueue here.
> > Yes, it is more like a cvq to do device attributes discovery and 
> > configuration.
> > Just that its generic enough.
> > For example
> > creating VQ that spans multiple physical addresses.
> > Discovery and enabling new device attributes which are rarely accessed or
> accessed during device init/cleanup time.
> >
> > The fundamental difference between cfgvq to tvq is:
> > Cfgvq is between the member virtio device and its driver.
> > Tvq=aq is on group owner of the virtio device.
> >
> > Compatibility is coming at a cost which is not scaling.
> > If you attempt to scale, it breaks the future path forward to go
> > without
> >> mediation.
> >>> So eve growing these fields and optionally placement on configq
> >>> doesn't really help and device builder to build it efficiently
> >>> (without much predictability).
> >> Config queue is not the only choice, we have a lot of other
> >> choices (for example PASID may help to reduce the on-chip resources).
> >>
> > PASID is for process isolation security construct, it is not for
> transportation.
>  I meant with PASID you don't even need a BAR.
> 
> >>> Not sure I fully understand, but my guess is, this is coming from
> >>> mediation thought process.
> >> Not actually. I meant, for example without PASID, for modern devices,
> >> the common cfg structure must be placed on a BAR through MMIO.
> >>
> >> But with the help of PASID, it could stay in the memory and the
> >> device can fetch it via DMA. The help to reduce the resources for 
> >> registers.
> >>
> > Yes, it can stay in memory and device can fetch via DMA using a VQ using its
> own q from guest driver.
> > Why does it need PASID?
> 
> 
> Let me clarify my understanding.
> 
> If we want a per VF virtqueue that is used by the hypervisor. We need PASID.
> Otherwise not.

Yes, we are on same page, only when such device is mediated by hypervisor.
Without mediation PASID is not required.


[virtio-dev] RE: [virtio-comment] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit


> From: Jason Wang 
> Sent: Thursday, May 11, 2023 3:21 AM
> 
> On Thu, May 11, 2023 at 12:08 AM Parav Pandit  wrote:
> >
> >
> >
> > On 5/10/2023 12:22 AM, Jason Wang wrote:
> > > On Wed, May 10, 2023 at 11:51 AM Jason Wang 
> wrote:
> >
> > >
> > > Just to make sure we are at the same page.
> > >
> > > 1) if the hardware has configq and we need to make it work for
> > > current virtio-pci driver, hypervisor needs to trap guest PCI access
> > > and translate it to configq command. This means the hypervisor needs
> > > to hide the configq from guests. In this case the configq needs a
> > > dedicated DMA address which is what PASID can help.
> >
> > > 2) if the hardware can report the device states, unless we want to
> > > migrate L2 guest, hypervisor should hide it from L1, so PASID is
> > > required to isolate the DMA for guest traffic and device state.
> >
> > A configq negotiates + discovers new fields for the PCI PF, VF,
> > SF/SIOV devices over PCIe or other transports.
> > So no need to hide/mediate for hw based devices, like cvq and like data vqs.
> 
> As discussed, I may misunderstand the function of the configq. If it works 
> like
> cvq, then yes. We don't need PASID.
> 
Right.

> But if we want to report device state via let's say migration q, it requires 
> PASID
> since it is used by the hypervisor.

When a device is used as passthrough, usually the owner group member would 
perform the device state management.
Hence passthrough mode will not use pasid.

When device is mediated in HV, yes PASID may be needed to isolate the DMA of 
hypervisor from the guest.


[virtio-dev] RE: [PATCH v17 03/11] content: Rename confusing queue_notify_data and vqn names

2023-05-15 Thread Parav Pandit


> From: Cornelia Huck 
> Sent: Monday, May 15, 2023 5:43 AM
> 
> On Fri, May 05 2023, Parav Pandit  wrote:
> 
> > diff --git a/content.tex b/content.tex index 9cf938c..327926f 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -405,8 +405,12 @@ \section{Driver Notifications} \label{sec:Basic
> > Facilities of a Virtio Device /  notification to the device.
> >
> >  When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, -this
> > notification involves sending the -virtqueue index to the device
> > (method depending on the transport).
> > +this notification contains either a virtqueue index if
> > +VIRTIO_F_NOTIF_CONFIG_DATA is not negotiated or device supplied
> > +virtqueue notification config data if VIRTIO_F_NOTIF_CONFIG_DATA is
> negotiated.
> > +
> > +The notification method and suppling any such virtqueue notification
> > +config data is transport specific.
> 
> Typo: s/suppling/supplying/ (trivial to fix on top)

Fixed at [1].
[1] https://lists.oasis-open.org/archives/virtio-comment/202305/msg00208.html

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH v1] content: Fix spelling error

2023-05-15 Thread Parav Pandit
Fix spelling error from suppling to supplying.

This is on top of [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202305/msg00101.html

Reported-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
changelog:
v0->v1:
- corrected subject
---
 content.tex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content.tex b/content.tex
index 327926f..9df81b8 100644
--- a/content.tex
+++ b/content.tex
@@ -409,8 +409,8 @@ \section{Driver Notifications} \label{sec:Basic Facilities 
of a Virtio Device /
 VIRTIO_F_NOTIF_CONFIG_DATA is not negotiated or device supplied virtqueue
 notification config data if VIRTIO_F_NOTIF_CONFIG_DATA is negotiated.
 
-The notification method and suppling any such virtqueue notification config 
data
-is transport specific.
+The notification method and supplying any such virtqueue notification config
+data is transport specific.
 
 However, some devices benefit from the ability to find out the
 amount of available data in the queue without accessing the virtqueue in 
memory:
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH] content: Fix type error

2023-05-15 Thread Parav Pandit


> From: Parav Pandit 
> Sent: Monday, May 15, 2023 11:05 AM
> To: m...@redhat.com; coh...@redhat.com; virtio-comment@lists.oasis-
> open.org
> Cc: virtio-dev@lists.oasis-open.org; Shahaf Shuler ; Parav
> Pandit 
> Subject: [PATCH] content: Fix type error
Sorry I was supposed to send for self-review and used wrong script.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PATCH] content: Fix type error

2023-05-15 Thread Parav Pandit
Fix spelling error from suppling to supplying.

This is on top of [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202305/msg00101.html

Reported-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
 content.tex | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content.tex b/content.tex
index 327926f..9df81b8 100644
--- a/content.tex
+++ b/content.tex
@@ -409,8 +409,8 @@ \section{Driver Notifications} \label{sec:Basic Facilities 
of a Virtio Device /
 VIRTIO_F_NOTIF_CONFIG_DATA is not negotiated or device supplied virtqueue
 notification config data if VIRTIO_F_NOTIF_CONFIG_DATA is negotiated.
 
-The notification method and suppling any such virtqueue notification config 
data
-is transport specific.
+The notification method and supplying any such virtqueue notification config
+data is transport specific.
 
 However, some devices benefit from the ability to find out the
 amount of available data in the queue without accessing the virtqueue in 
memory:
-- 
2.26.2


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Parav Pandit

> From: Michael S. Tsirkin 
> Sent: Monday, May 15, 2023 6:09 AM
> To be more specific, device config often has the same layout under legacy and
> modern. Thus if getting an offset within device config, same decoding logic
> should be reusable between legacy and modern.
Modern device registers are directly accessible from the guest driver without 
mediation for VF,SF,SIOV.
Hence, there is no need to have commands for modern register access over some 
VQ.

Will reply to rest of the Jason's comments in some time.


[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Michael S. Tsirkin
On Mon, May 15, 2023 at 03:30:47PM +0800, Jason Wang wrote:
> 
> 在 2023/5/11 21:02, Parav Pandit 写道:
> > 
> > > From: Michael S. Tsirkin 
> > > Sent: Thursday, May 11, 2023 8:55 AM
> > > > 1) device features
> > > > 2) driver features
> > > > 3) queue address
> > > > 4) queue size
> > > > 5) queue select
> > > > 6) queue notify
> > > > 7) device status
> > > > 8) ISR status
> > > > 9) config msix
> > > > 10) queue msix
> > > > 11) device configuration space
> > > > 
> > #9 may not even go to the group owner device.
> 
> 
> It the config_msix_vector register so I don't understand why it can't work
> but can work with your proposal.
> 
> 
> > What do we gain from bisecting it?
> 
> 
> 1) No need to deal with the offset in the hardware
> 
> 2) transport independent
> 
> 3) avoid duplication with the (legacy support of) transport virtqueue

To be more specific, device config often has the same layout under
legacy and modern. Thus if getting an offset within device config,
same decoding logic should be reusable between legacy and modern.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v17 03/11] content: Rename confusing queue_notify_data and vqn names

2023-05-15 Thread Cornelia Huck
On Fri, May 05 2023, Parav Pandit  wrote:

> diff --git a/content.tex b/content.tex
> index 9cf938c..327926f 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -405,8 +405,12 @@ \section{Driver Notifications} \label{sec:Basic 
> Facilities of a Virtio Device /
>  notification to the device.
>  
>  When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,
> -this notification involves sending the
> -virtqueue index to the device (method depending on the transport).
> +this notification contains either a virtqueue index if
> +VIRTIO_F_NOTIF_CONFIG_DATA is not negotiated or device supplied virtqueue
> +notification config data if VIRTIO_F_NOTIF_CONFIG_DATA is negotiated.
> +
> +The notification method and suppling any such virtqueue notification config 
> data
> +is transport specific.

Typo: s/suppling/supplying/ (trivial to fix on top)


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Jason Wang



在 2023/5/11 21:02, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Thursday, May 11, 2023 8:55 AM

1) device features
2) driver features
3) queue address
4) queue size
5) queue select
6) queue notify
7) device status
8) ISR status
9) config msix
10) queue msix
11) device configuration space


#9 may not even go to the group owner device.



It the config_msix_vector register so I don't understand why it can't 
work but can work with your proposal.




What do we gain from bisecting it?



1) No need to deal with the offset in the hardware

2) transport independent

3) avoid duplication with the (legacy support of) transport virtqueue



Every new additional needs involvement of the hypervisor as Michael noted below on 
"effort" point.



I'm not sure I get your point.

If additional effort is not wanted, we need a full PCI over adminq, 
that's what I suggested, then we don't need any extension for any new 
features probably.


If we choose to invent dedicated adminq commands:

For legacy, we don't need any new additional effort since the above 11 
maps to the legacy virtio-pci facilities.


For modern, if the feature is something related to the transport, we 
need new transport interface for sure.





The register offset and read/write is far simpler interface for hypervisor.



If this is true, why not go full PCI over adminq? Hypervisor in this 
case even don't need to deal with config space.





Not to do cross register access is what we need to define in the spec for 1.x 
or future things.



You can't assume the hypervisor behaviors, so the device or at least the 
spec should be ready for that.


Thanks





It focuses on the facilities instead of transport specific details
like registers (we don't even need legacy registers in this case), I
gives more deterministic behavior so we don't need to care about the
cross registers read/write.

This needs thought, it is definitely more work.  Effort that could be maybe 
spent
on new features.  What is the motivation here? supporting legacy mmio guests?



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Jason Wang



在 2023/5/11 20:54, Michael S. Tsirkin 写道:

On Thu, May 11, 2023 at 03:04:40PM +0800, Jason Wang wrote:

On Wed, May 10, 2023 at 3:43 PM Michael S. Tsirkin  wrote:

On Wed, May 10, 2023 at 03:01:25PM +0800, Jason Wang wrote:

On Wed, May 10, 2023 at 2:05 PM Michael S. Tsirkin  wrote:

On Mon, May 08, 2023 at 10:23:39AM +0800, Jason Wang wrote:

I thought so too originally. Unfortunately I now think that no, legacy is not
going to be a byproduct of transport virtqueue for modern -
it is different enough that it needs dedicated commands.

If you mean the transport virtqueue, I think some dedicated commands
for legacy are needed. Then it would be a transport that supports
transitional devices. It would be much better than having commands for
a partial transport like this patch did.

OK I am beginning to get what you are saying.  So your criticism is
this: what if device supports vq transport for modern, and we want to
build a transitional device on top.  how will that look. yes?

Yes. I think it needs to be done through the transport virtqueue
otherwise the transport is not self-contained.

I mean, any feature can be done over transport vq.

But there is value in adding legacy commands to an otherwise
modern device without reworking it completely to
switch to a different transport.

There's probably no need for a rework since legacy is not complicated.
More below.




A reasonable thing to include at least in the commit log. Parav?

You are also asking what if the device uses transport vq,
and we want transitional on top of that.
It's a fair question but I don't exactly get why would
this legacy support feature be wanted for the vq transport
and not for other transports.

Not sure I get the question, but all the existing transport support
legacy, if we want to have another, should the legacy support be a
must or not?

This specific proposal is for tunneling legacy over admin vq.
It can live alongside a normal modern VF, with hypervisor
combining these to create a transitional device.

Exactly, but what I meant here is

If we decide to use the admin vq, is there any good reason to tie it
to PCI if we don't want to tunneling PCI over adminq?

Why not simply invent individual commands to access legacy facilities
like commands to access like what transport virtqueue did for modern
device?:

1) device features
2) driver features
3) queue address
4) queue size
5) queue select
6) queue notify
7) device status
8) ISR status
9) config msix
10) queue msix
11) device configuration space

It focuses on the facilities instead of transport specific details
like registers (we don't even need legacy registers in this case), I
gives more deterministic behavior so we don't need to care about the
cross registers read/write.

This needs thought, it is definitely more work.  Effort that could be
maybe spent on new features.  What is the motivation
here? supporting legacy mmio guests?



Probably. It tries to make legacy transport independent, and it's the 
way that how transport virtqueue want to be.


Thanks


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

2023-05-15 Thread Jason Wang



在 2023/5/11 21:28, Parav Pandit 写道:



From: Jason Wang 
Sent: Thursday, May 11, 2023 3:07 AM

4th point in cover letter is: "config virtqueues of the managed device".

This work belongs to the driver -> device direct communication using
a queue from driver to device.
So, I imagine this work can be done using a queue by the guest
driver and serviced by the device like how a guest driver configures
the queue today without any mediation.
For PCI, MMIO transport, surely this can be done by the PCI device
directly being is PF, VF or SIOV.
(Instead of config register, using a new queue interface). Looks fine to me.

Great.


Can this new cfg queue mediated like CVQ that is done in a sw? May be yes.
Should it be always mediated when it is of VF, SIOV Device? Mostly
no because it is yet another VQ for PF, VF, SIOV.

Yes, the mediation is always conditional but not mandated when doing the
design.


I am yet to parse rest of the 4 patches, please give me some time to review

it.

I went over the past work of [1], [2].

[1]
https://lore.kernel.org/all/20220826100034.200432-2-lingshan.zhu@intel
.com/ [2]
https://lists.oasis-open.org/archives/virtio-comment/202208/msg00141.h
tml

The "virtio q as transport" in [2] is bit misleading as its only role is to

transport the _registers_ of the SIOV_R1 device through its parent PF.

Rest of the work is the pure management work to manage the life cycle of

SIOV devices (create/delete/configure/compose).

I think this can be tweaked for static provisioning setups like SR-IOV. E.g 
group
owner can choose to not advertise the life cycle management commands. Then
the hypervisor may know the provisioning is transport specific.


It is just bunch of admin commands in the new SIOV or virtual virtio device 
chapter.
It is not a transport chapter.



Provisioning could be a part of the transport. For example SR-IOV allows 
static and dynamic provisioning. If a virtio device needs to be 
implemented in VF, it needs to be provisioned first.






And the motivation is also clear is to provide composing a virtio device for the

guest VM for the backward compatibility for 1.x part of the specification.

It seems fine and indeed orthogonal to me that: it is for backward

compatibility for already defined config fields for existing guest VM driver.

It does not conflict with the cfgq/cmdq idea whose main purpose is for the

new config fields, new use cases that doesn't require any mediation.

Such cfgq works across PF, VF, SF/SIOV devices in uniform way without

mediation.

My understanding is that the cfgq/cmdq could be done on top, since the
commands are part of transport unless I miss something.


On top of what?
cfgq/cmdq is just another queue like any other VQ.
it is orthogonal to accessing 1.x registers using group owner's queue.



I think there hasn't been any proposal that call any virtqueue like 
cfgq/cmdq. So using that may cause a lot of misunderstanding. I think 
the context here is to provide or extend transport facilities via a 
virtqueue not something like the control virtqueue.






It also scales device register memory also very well in predictable way.

The registers and feature bits access described in [2], can certainly be done

using new non_legacy new AQ commands.

Both legacy and non-legacy command have different semantics as Michael

mentioned.

The AQ semantics that Michael did as opposed to "virtqueue as transport" fits

well for the use case described in [1].

There are changes on going in MSI-X area and SIOV, so you might want to

wait for it.

I think you meant IMS, actually, it's not hard to add new commands to support
a general IMS. We can start from a virtio specific one (as it looks a common
requirement not only for PCI/SIOV but also for transport like MMIO)


IMS configuration usually supposed to go directly to the device.



Admin virtqueue doesn't preventing anything from letting IMS go directly 
to the device.




  Long discussion with Thomas on IMS topic.
I will avoid diverting to unrelated discussion for now.


Or proposed command in [1] should be tagged as siov_r1, then things will be

cleaner.

Maybe we can say, one of the implementations is siov_r1, since it can be used
to implement devices in other transport. One of the main motivations for
transport virtqueue is to reduce the transport specific resources to ease the
virtio device implementation.


Yes, the main motivation as for backward compatibility for the currently 
defined features.


With that I don't see legacy 3 commands anyway conflict with [1].

It doesn't, but it's not elegant since:


I don’t see how making 3 commands to 9 commands makes it elegant by doing 
bisecting of registers offset in hypervisor.



Or it needs to be done by the hardware and cross register write/read 
needs to be handled there as well.




Its same thing using more opcodes.



Do we really care about the number of opcodes? I think we've reserved 
sufficient spaces.






1) Modern transport, admin virtqueue