[virtio-dev] Re: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Jason Wang



在 2023/6/28 18:10, Michael S. Tsirkin 写道:

On Wed, Jun 28, 2023 at 11:46:22AM +0800, Jason Wang wrote:

On Wed, Jun 28, 2023 at 12:35 AM Heng Qi  wrote:

1. Currently, a received encapsulated packet has an outer and an inner header, 
but
the virtio device is unable to calculate the hash for the inner header. The same
flow can traverse through different tunnels, resulting in the encapsulated
packets being spread across multiple receive queues (refer to the figure below).
However, in certain scenarios, we may need to direct these encapsulated packets 
of
the same flow to a single receive queue. This facilitates the processing
of the flow by the same CPU to improve performance (warm caches, less locking, 
etc.).

client1client2
   |+---+ |
   +--->|tunnels|<+
+---+
   |  |
   v  v
   +-+
   | monitoring host |
   +-+

To achieve this, the device can calculate a symmetric hash based on the inner 
headers
of the same flow.

2. For legacy systems, they may lack entropy fields which modern protocols have 
in
the outer header, resulting in multiple flows with the same outer header but
different inner headers being directed to the same receive queue. This results 
in
poor receive performance.

To address this limitation, inner header hash can be used to enable the device 
to advertise
the capability to calculate the hash for the inner packet, regaining better 
receive performance.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/173
Signed-off-by: Heng Qi 
Reviewed-by: Xuan Zhuo 
Reviewed-by: Parav Pandit 
---
v18->v19:
 1. Have a single structure instead of two. @Michael S . Tsirkin
 2. Some small rewrites. @Michael S . Tsirkin
 3. Rebase to master.

v17->v18:
 1. Some rewording suggestions from Michael (Thanks!).
 2. Use 0 to disable inner header hash and remove
VIRTIO_NET_HASH_TUNNEL_TYPE_NONE.
v16->v17:
 1. Some small rewrites. @Parav Pandit
 2. Add Parav's Reviewed-by tag (Thanks!).

v15->v16:
 1. Remove the hash_option. In order to delimit the inner header hash 
and RSS
configuration, the ability to configure the outer src udp port hash 
is given
to RSS. This is orthogonal to inner header hash, which will be done 
in the
RSS capability extension topic (considered as an RSS extension 
together
with the symmetric toeplitz hash algorithm, etc.). @Parav Pandit 
@Michael S . Tsirkin
 2. Fix a 'field' typo. @Parav Pandit

v14->v15:
 1. Add tunnel hash option suggested by @Michael S . Tsirkin
 2. Adjust some descriptions.

v13->v14:
 1. Move supported_hash_tunnel_types from config space into cvq 
command. @Parav Pandit

I may miss some discussions, but this complicates the provisioning a lot.

Having it in the config space, then a type agnostic provisioning
through config space + feature bits just works fine.

If we move it only via cvq, we need device specific provisioning interface.

Thanks

Yea that's what I said too. Debugging too.  I think we should build a
consistent solution that allows accessing config space through DMA,
separately from this effort.  Parav do you think you can live with this
approach so this specific proposal can move forward?



We can probably go another way, invent a new device configuration space 
capability which fixed size like PCI configuration access capability?


struct virtio_pci_cfg_cap {
    struct virtio_pci_cap cap;
    u8 dev_cfg_data[4]; /* Data for device configuration space 
access. */

};

So it won't grow as the size of device configuration space grows.

Thanks







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] [RFC PATCH] admin-queue: bind the group member to the device

2023-06-28 Thread Jason Wang
On Wed, Jun 28, 2023 at 11:55 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jun 28, 2023 at 02:06:32PM +0800, Xuan Zhuo wrote:
> > On Wed, 28 Jun 2023 10:49:45 +0800, Jason Wang  wrote:
> > > On Tue, Jun 27, 2023 at 6:54 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 27 Jun 2023 17:00:06 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Jun 27, 2023 at 4:28 PM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > >
> > > > > > Thanks Parav for pointing it out. We may have some gaps on the case.
> > > > > >
> > > > > > Let me introduce our case, which I think it is simple and should be 
> > > > > > easy to
> > > > > > understand.
> > > > > >
> > > > > > First, the user (customer) purchased a bare metal machine.
> > > > > >
> > > > > > ## Bare metal machine
> > > > > >
> > > > > > Let me briefly explain the characteristics of a bare metal machine. 
> > > > > > It is not a
> > > > > > virtual machine, it is a physical machine, and the difference 
> > > > > > between it and a
> > > > > > general physical machine is that its PCI is connected to a device 
> > > > > > similar to a
> > > > > > DPU. This DPU provides devices such as virtio-blk/net to the host 
> > > > > > through PCI.
> > > > > > These devices are managed by the vendor, and must be created and 
> > > > > > purchased
> > > > > > on the vendor's management platform.
> > > > > >
> > > > > > ## DPU
> > > > > >
> > > > > > There is a software implementation in the DPU, which will respond 
> > > > > > to PCI
> > > > > > operations. But as mentioned above, resources such as network cards 
> > > > > > must be
> > > > > > purchased and created before they can exist. So users can create 
> > > > > > VF, which is
> > > > > > just a pci-level operation, but there may not be a corresponding 
> > > > > > backend.
> > > > > >
> > > > > > ## Management Platform
> > > > > >
> > > > > > The creation and configuration of devices is realized on the 
> > > > > > management
> > > > > > platform.
> > > > > >
> > > > > > After the user completed the purchase on the management platform 
> > > > > > (this is an
> > > > > > independent platform provided by the vendor and has nothing to do 
> > > > > > with
> > > > > > virtio), then there will be a corresponding device implementation 
> > > > > > in the DPU.
> > > > > > This includes some user configurations, available bandwidth 
> > > > > > resources and other
> > > > > > information.
> > > > > >
> > > > > > ## Usage
> > > > > >
> > > > > > Since the user is directly on the HOST, the user can create VMs, 
> > > > > > passthrough PF
> > > > > > or VF into the VM. Or users can create a large number of dockers, 
> > > > > > all of which
> > > > > > use a separate virtio-net device for performance.
> > > > > >
> > > > > > The reason why users use vf is that we need to use a large number 
> > > > > > of virtio-net
> > > > > > devices. This number reaches 1k+.
> > > > > >
> > > > > > Based on this scenario, we need to bind vf to the backend device. 
> > > > > > Because, we
> > > > > > cannot automatically complete the creation of the virtio-net 
> > > > > > backend device when
> > > > > > the user creates a vf.
> > > > > >
> > > > > > ## Migration
> > > > > >
> > > > > > In addition, let's consider another scenario of migration. If a vm 
> > > > > > is migrated
> > > > > > from another host, of course its corresponding virtio device is 
> > > > > > also migrated to
> > > > > > the DPU. At this time, our newly created vf can only be used by the 
> > > > > > vm after it
> > > > > > is bound to the migrated device. We do not want this vf to be a 
> > > > > > brand new
> > > > > > device.
> > > > > >
> > > > > > ## Abstraction
> > > > > >
> > > > > > So, this is how I understand the process of creating vf:
> > > > > >
> > > > > > 1. Create a PCI VF, at this time there may be no backend virtio 
> > > > > > device, or there
> > > > > > is only a default backend. It does not fully meet our 
> > > > > > expectations.
> > > > > > 2. Create device or migrate device
> > > > > > 3. Bind the backend virtio device to the vf
> > > > >
> > > > > 3) should come before 2)?
> > > > >
> > > > > Who is going to do 3) btw, is it the user? If yes, for example, if a
> > > > > user wants another 4 queue virtio-net devices, after purchase, how
> > > > > does the user know its id?
> > > >
> > > > Got the id from the management platform.
> > >
> > > So it can do the binding via that management platform which this
> > > became a cloud vendor specific interface.
> >
> > In our scenario, this is bound by the user using this id and vf id in the 
> > os.
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > In most scenarios, the first step may be enough. We can make some 
> > > > > > fine-tuning on
> > > > > > this default device, such as modifying its mac. In the future, we 
> > > > > > can use admin
> > > > > > queue to modify its msix vector and other configurations.
> > > > > >
> > > > > > But we should allow, we bind a backend virtio device to a certain 

[virtio-dev] Re: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Jason Wang
On Thu, Jun 29, 2023 at 12:00 AM Parav Pandit  wrote:
>
>
>
> > From: Jason Wang 
> > Sent: Wednesday, June 28, 2023 1:38 AM
>
> [..]
> > > Provisioning is far simpler thing to do in device specific way than 
> > > asking device
> > to store this value in onchip area which is rarely accessed.
> >
> > Are you suggesting to not place any new fields in the config space?
> >
> Yes.

It's not the fault of config space itself, but the way to access the
config space, more below.

>
> > struct virtio_net_config {
> > u8 mac[6];
> > le16 status;
> > le16 max_virtqueue_pairs;
> > le16 mtu;
> > le32 speed;
> > u8 duplex;
> > u8 rss_max_key_size;
> > le16 rss_max_indirection_table_length;
> > le32 supported_hash_types;
> > };
> >
> > Which of the above do you think can be accessed frequently and which part of
> > the spec says it must be stored in the onchip area?
> >
> Most are not accessed frequently.
> The fact that they are in MMIO a device needs to place in a memory with tight 
> latency budget.

This really depends on the implementation and vendor architectures.
For example,

1) MSI BAR might require much more resources than a simple device
configuration space
2) I was told my some vendors that the virtqueue is much more valuable than MMIO
3) We can introduce new ways to access the device configuration space

> Spec is not going to talk on onchip area, it is the reflection of spec that 
> forces certain inefficient implementation .

Exactly, it's implementation specific, so config space is fine, we
just need to invent new methods to access them that fit for your
hardware.

Thanks


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Heng Qi




在 2023/6/29 上午9:56, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Wednesday, June 28, 2023 3:45 PM

Maybe I get it. You want to use the new features as a carrot to
force drivers to implement DMA? You suspect they will ignore the
spec requirement just because things seem to work?


Right because it is not a must normative.

Well SHOULD also does not mean "ok to just ignore".

This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.


RECOMMENDED and SHOULD forces the device to support MMIO, which is not good.
So rather a good design is device tells the starting offset for the extended 
config space.
And extended config space MUST be accessed using a DMA.
With this sw can have infinite size MMIO and hw device forces DMA based on its 
implementation of where to start DMA from.
This also gives the ability to maintain current config as MMIO for backward 
compatibility.
  



There's some logic here, for sure. you just might be right.

However, surely we can discuss this small tweak in 1.4 timeframe?

Sure, if we prefer the DMA approach I don't have a problem in adding

temporary one field to config space.

I propose to add a line to the spec " Device Configuration Space"
section, something like,

Note: Any new device configuration space fields additional MUST consider

accessing such fields via a DMA interface.

And this will guide the new patches of what to do instead of last moment

rush.

Yea, except again I'd probably make it a SHOULD: e.g. I can see how switching to
MMIO might be an option for qemu helping us debug DMA issues.


There are too many queues whose debugging is needed and MMIO likely not the way 
to debug.
  

The time to discuss this detail would be around when proposal for the DMA
access to config space is on list though: I feel this SHOULD vs MUST is a small
enough detail.


 From implementation POV it is certainly critical and good step forward to 
optimize virtio interface.
  

Going back to inner hash. If we move supported_tunnels back to config space,
do you feel we still need GET or just drop it? I note we do not have GET for
either hash or rss config.


For hash and rss config, debugging is missing. :)
Yes, we can drop the GET after switching supported_tunnels to struct 
virtio_net_hash_config.


Great! Glad to hear this!

  

And if we no longer have GET is there still a reason for a separate command as
opposed to a field in virtio_net_hash_config?
I know this was done in v11 but there it was misaligned.
We went with a command because we needed it for supported_tunnels but
now that is no longer the case and there are reserved words in
virtio_net_hash_config ...

Let me know how you feel it about that, not critical for me.

struct virtio_net_hash_config reserved is fine.


+1.

Inner header hash is orthogonal to RSS, and it's fine to have its own 
structure and commands.
There is no need to send additional RSS fields when we configure inner 
header hash.


Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Wednesday, June 28, 2023 3:45 PM
> > > Maybe I get it. You want to use the new features as a carrot to
> > > force drivers to implement DMA? You suspect they will ignore the
> > > spec requirement just because things seem to work?
> > >
> > Right because it is not a must normative.
> 
> Well SHOULD also does not mean "ok to just ignore".
> 
>   This word, or the adjective "RECOMMENDED", mean that there
>  may exist valid reasons in particular circumstances to ignore a
>  particular item, but the full implications must be understood and
>  carefully weighed before choosing a different course.
>
RECOMMENDED and SHOULD forces the device to support MMIO, which is not good.
So rather a good design is device tells the starting offset for the extended 
config space.
And extended config space MUST be accessed using a DMA.
With this sw can have infinite size MMIO and hw device forces DMA based on its 
implementation of where to start DMA from.
This also gives the ability to maintain current config as MMIO for backward 
compatibility.
 
> 
> 
> > > There's some logic here, for sure. you just might be right.
> > >
> > > However, surely we can discuss this small tweak in 1.4 timeframe?
> >
> > Sure, if we prefer the DMA approach I don't have a problem in adding
> temporary one field to config space.
> >
> > I propose to add a line to the spec " Device Configuration Space"
> > section, something like,
> >
> > Note: Any new device configuration space fields additional MUST consider
> accessing such fields via a DMA interface.
> >
> > And this will guide the new patches of what to do instead of last moment
> rush.
> 
> Yea, except again I'd probably make it a SHOULD: e.g. I can see how switching 
> to
> MMIO might be an option for qemu helping us debug DMA issues.
>
There are too many queues whose debugging is needed and MMIO likely not the way 
to debug.
 
> The time to discuss this detail would be around when proposal for the DMA
> access to config space is on list though: I feel this SHOULD vs MUST is a 
> small
> enough detail.
>
>From implementation POV it is certainly critical and good step forward to 
>optimize virtio interface.
 
> Going back to inner hash. If we move supported_tunnels back to config space,
> do you feel we still need GET or just drop it? I note we do not have GET for
> either hash or rss config.
>
For hash and rss config, debugging is missing. :)
Yes, we can drop the GET after switching supported_tunnels to struct 
virtio_net_hash_config.
 
> And if we no longer have GET is there still a reason for a separate command as
> opposed to a field in virtio_net_hash_config?
> I know this was done in v11 but there it was misaligned.
> We went with a command because we needed it for supported_tunnels but
> now that is no longer the case and there are reserved words in
> virtio_net_hash_config ...
> 
> Let me know how you feel it about that, not critical for me.

struct virtio_net_hash_config reserved is fine.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 05:38:29PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, June 28, 2023 1:24 PM
> 
> 
> > > Because when device has two ways to access config space, it always have to
> > account and build the interface that, hey some driver will not use DMA.
> > > Hence have it always in the MMIO accessible area.
> > 
> > Maybe I get it. You want to use the new features as a carrot to force 
> > drivers to
> > implement DMA? You suspect they will ignore the spec requirement just
> > because things seem to work?
> >
> Right because it is not a must normative.

Well SHOULD also does not mean "ok to just ignore".

This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.



> > There's some logic here, for sure. you just might be right.
> > 
> > However, surely we can discuss this small tweak in 1.4 timeframe?
> 
> Sure, if we prefer the DMA approach I don't have a problem in adding 
> temporary one field to config space.
> 
> I propose to add a line to the spec " Device Configuration Space" section, 
> something like,
> 
> Note: Any new device configuration space fields additional MUST consider 
> accessing such fields via a DMA interface.
> 
> And this will guide the new patches of what to do instead of last moment rush.

Yea, except again I'd probably make it a SHOULD: e.g. I can see how
switching to MMIO might be an option for qemu helping us debug DMA
issues.

The time to discuss this detail would be around when proposal for the DMA
access to config space is on list though: I feel this SHOULD vs MUST
is a small enough detail.

Going back to inner hash. If we move supported_tunnels back to config space,
do you feel we still need GET or just drop it? I note we do not
have GET for either hash or rss config.

And if we no longer have GET is there still a reason for a separate
command as opposed to a field in virtio_net_hash_config?
I know this was done in v11 but there it was misaligned.
We went with a command because we needed it for supported_tunnels
but now that is no longer the case and
there are reserved words in virtio_net_hash_config ...

Let me know how you feel it about that, not critical for me.


-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Parav Pandit


> From: Michael S. Tsirkin 
> Sent: Wednesday, June 28, 2023 1:24 PM


> > Because when device has two ways to access config space, it always have to
> account and build the interface that, hey some driver will not use DMA.
> > Hence have it always in the MMIO accessible area.
> 
> Maybe I get it. You want to use the new features as a carrot to force drivers 
> to
> implement DMA? You suspect they will ignore the spec requirement just
> because things seem to work?
>
Right because it is not a must normative.
 
> There's some logic here, for sure. you just might be right.
> 
> However, surely we can discuss this small tweak in 1.4 timeframe?

Sure, if we prefer the DMA approach I don't have a problem in adding temporary 
one field to config space.

I propose to add a line to the spec " Device Configuration Space" section, 
something like,

Note: Any new device configuration space fields additional MUST consider 
accessing such fields via a DMA interface.

And this will guide the new patches of what to do instead of last moment rush.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Parav Pandit


> From: Michael S. Tsirkin 
> Sent: Wednesday, June 28, 2023 1:16 PM

> > 1. Because than device is 100% sure that it does not fulfill MMIO
> > needs 2. Single interface to access extended config space, which is what you
> asked for.
> > 3. Single driver code because there is single way to get it.
> >
> 
> Right. But I think there's a better way.
> Just say that any feature in MMIO MUST be also in DMA.
> So any modern driver will have no reason at all to use MMIO - DMA is a
> superset.
>
How does that relax the need of MMIO for the device?

> If we say that drivers should use DMA, they will.  If we additionally explain 
> that
> some features might not be in MMIO no sane driver will use MMIO if it does not
> have to.
> Or if it does then it has violated the spec and will get less features?
> 
So than why to have it in MMIO anyway?
Why not say driver must use dma?

> > This demands two ways of access and that conflicts with your point of desire
> to do single way.
> > I proposed single way, extended config space via DMA interface, that is 
> > really
> as simple as that.
> 
> 
> My desire is to have a single way that works for everything.
> We have legacy so we will have legacy ways too, these might not work for
> everything.
Here is what can work well,

A device tells the offset of the extended config space to the driver.
Any field starting from that offset must be accessed via a DMA interface.
This way, driver must support DMA and all new features come from there.

If device chose to use MMIO, say sw, it can still do it using MMIO.
 

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 05:06:40PM +, Parav Pandit wrote:
> > > >
> > > > Please let people just focus on fixing config space instead of
> > > > temporary cvq hacks.
> > >
> > > The ask is to have predictable sizing for existing defined config space 
> > > field and
> > not keep infinitely growing by two interfaces.
> > 
> > I don't know what "predictable sizing" is or why does it matter.
> > 
> Because when device has two ways to access config space, it always have to 
> account and build the interface that, hey some driver will not use DMA.
> Hence have it always in the MMIO accessible area.

Maybe I get it. You want to use the new features as a carrot to force
drivers to implement DMA? You suspect they will ignore the spec
requirement just because things seem to work?

There's some logic here, for sure. you just might be right.

However, surely we can discuss this small tweak in 1.4 timeframe?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 05:06:40PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, June 28, 2023 12:46 PM
> 
> > Actually, my preblem is exactly if there's no one single interface that can 
> > do
> > everything.
> >
> Extended config space reading via single DMA interface is just enough.
> 
> > 
> > Two interfaces that do the same might seem redundant but there could be good
> > reasons for this, in this case legacy.
> >
> But legacy should not force it to newer interface. More below.
>  
> > See the difference?
> > 
> 
> > > I didn't propose two interfaces. Let me explain again.
> > >
> > > I proposed,
> > > a. all new fields to go via vq (single interface) b. all existing
> > > fields to stay in cfg space (single interface) c. if one wants to
> > > optimize optionally existing fields can utilize (same flexibility how you 
> > > asked
> > for PF and VF notification).
> > >
> > >
> > > > We already need config space with provisioning.
> > > Provisioning does provision features, config space and control parameters,
> > msix vectors and more.
> > > Provisioning does _not_ need config space.
> > 
> > I mean it needs to provision it.
> > 
> No matter there is config space of dma, a device to be provisioned.
> 
> > > > In fact e.g. on Linux we want drivers to keep doing virtio_cread()
> > > > and all the DMA tricks should be hidden behind the scenes. You don't
> > > > like it that config space is on-chip? Build an interface for it not
> > > > to be on chip. There is simply no downside to that.
> > > >
> > > This is the ask to build software interface between driver and device to 
> > > not
> > demand on-chip config space more importantly _predictably_.
> > 
> > Absolutely. So we would say that driver SHOULD use the DMA interface if it 
> > is
> > present.
> > 
> I am asking for driver MUST use the DMA interface for _extended_ config space.
> Why?
> For two reasons.
> 
> 1. Because than device is 100% sure that it does not fulfill MMIO needs
> 2. Single interface to access extended config space, which is what you asked 
> for.
> 3. Single driver code because there is single way to get it.
> 

Right. But I think there's a better way.
Just say that any feature in MMIO MUST be also in DMA.
So any modern driver will have no reason at all to use MMIO - DMA
is a superset.


> > > > I came back after a small break and I still see this discussion that
> > > > keeps playing out as a big distraction.
> > > > In particular where are you doing to store the things that are for DMA?
> > > > In system RAM? We don't *have* anywhere in system RAM to store this,
> > > > we will need an interface to allocate system RAM and pass it to
> > > > device for this idea to work. So in 1.3 where we won't have this,
> > > > there is much less of an excuse to use a vq.
> > > To store what?
> > > CVQ is already there to do GET and SET operation.
> > 
> > To store whatever value CVQ is requesting. Where is it going to come from?
> >
> It is going to come from the device memory that does not have hard 
> requirements of MMIO accessibility.
> For example slow flash memory.
>  
> > Asynchronous is just harder for software.  Look at the hoops virtio net has 
> > to
> > jump through to do control path over cvq and weep. Compare to single liner
> > virtio_cread.
> > 
> > I happen to maintain Linux drivers too, so I care.
> > 
> It is harder for first time if no such construct exists; and it is clearly 
> not the case.
> This patch is already adding CVQ get and set command.

I expect get to mostly go unused.

> Async behavior already built into the virtio spec.
> Extending it for extended config space just make device more cheaper to build.
> Software cost is paid only once at device init time just like CVQ, AQ and 
> more.
> 
> > > >
> > > > Please let people just focus on fixing config space instead of
> > > > temporary cvq hacks.
> > >
> > > The ask is to have predictable sizing for existing defined config space 
> > > field and
> > not keep infinitely growing by two interfaces.
> > 
> > I don't know what "predictable sizing" is or why does it matter.
> > 
> Because when device has two ways to access config space, it always have to 
> account and build the interface that, hey some driver will not use DMA.
> Hence have it always in the MMIO accessible area.

If we say that drivers should use DMA, they will.  If we additionally
explain that some features might not be in MMIO no sane driver will use
MMIO if it does not have to.
Or if it does then it has violated the spec and will get less features?




> > I get the value of reduced MMIO register map.
> > I get that the only way to get that appears to be to pass values around 
> > using
> > DMA.
> >
> Ok. great.
>  
> > I don't think we need to throw away config space - just use DMA to access 
> > it.
> > 
> > And "existing defined" here seems to be at a completely random point in 
> > time.
> > If someone wants to have fields in MMIO, I see no reason not to.
> 
> 

[virtio-dev] Re: [virtio-comment] Re: [PATCH v18] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 04:46:04PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, June 28, 2023 6:41 AM
> 
> > > > So just quickly add the new capability in the spec and then the
> > > > number of linux releases that will have the new feature but not
> > > > config command or whatever that is will be too small for vendors to 
> > > > care.
> > > >
> > > I didn't follow this suggestion.
> > 
> > It is very simple though. 1.3 has inner hash feature. Imagine that instead 
> > of
> > endless flamewars rehashing same arguments, immediately post
> 
> For sure if you call this discussion a war, I didn't start it and I didn't 
> delay it to this point either :)

Yea sorry I'm being harsh, the threads here grow too much. We all need
to make an effort not to repeat ourselves.

> Patch added GET and SET command, GET returns device static and dynamic value, 
> without burdening the device in consistent way.
> Will moving to config space, take away the GET command? No?
> 
> Then what did one gain other than extra complexity of additional config 
> space? Just more device memory burning.

I don't think Linux will ever use GET. It's there for completeness, I'm
fine with getting rid of it too.


> > 1.3 we vote on an interface to access config space through DMA (not sure 
> > it's a
> > VQ BTW but that's a separate discussion).  This should include a way to 
> > expose
> > a subset of DMA features through MMIO, for compatibility.
> 
> I think you are missing the main point that I try to highlight but presumably 
> it didn't come across. :(
> 
> Having an optional DMA interface does _not_ help device to optimize, because 
> device must be support old drivers.

Yea, I don't get it, sorry.
So where's this old driver that uses the inner hash feature?
It maybe will be released down the road right?
Why can't it at the same time support DMA for access to this field?






> The proposal I explained in previous email is:
> a. All new fields via vq
> This means it is _guaranteed_ to be offchip and zero reason for backward 
> compatibility, because there is no backward compat of non existing fields.
> 
> b. all existing fields stay in cfg space
> 
> c. Optionally existing fields can also be queried via cfg space


And what I say is even simpler:
all config space is accessible through DMA
some of it optionally in MMIO

which fields to have in MMIO will be up to device.
Research what do drivers you care about use, and include that.






> > If we are lucky guest support can land in the same Linux release that has 
> > inner
> > hash.  Drivers do not need to care: they just do virtio_cread and that is 
> > it.
> >
> virtio_cread for extended config space accessible only via DMA or VQ will 
> surely be hidden inside th virtio_cread(). Easy.
>  
> If we agree on this approach, great, one additional field like this can be 
> added to config space.

I think we are pretty close then, great!


> > So a vendor that builds device with inner hash, can expose inner hash only
> > through DMA and not through MMIO.
> > 
> > I conclude that there's no reason to block this feature just because it 
> > uses config
> > space.
> >
> I must repeat that I am not blocking, I replied to way back on both the 
> approaches in v14 or even before that.
> And also explained part of the proposal.
> 
> This is why one should discuss the design and not keep asking for patches..

I certainly don't prevent anyone from posting design sketches.  Problem
is this: even spec patches are often hard to understand, agrammatical,
etc. Not sure what to do here, but it's understandable when people only
start reviewing in earnest when the text gets half way readable.


> > 
> > > > >
> > > > > > A good implementation of virtio_cread can abstract that easily
> > > > > > so we don't need to change drivers.
> > > > >
> > > > > There is no backward compat issue for the GET command being new.
> > > >
> > > > It's just a shortcut replacing what we really want.  As long as a
> > > > shortcut is available people will keep using exactly that.  So I
> > > > fully expect more proposals for such GET commands on the pretext
> > > > that one is there so why not another one. Adding more tech debt for
> > > > whoever finally gets around to building a config space access gateway.
> > > >
> > > Not really. as suggested, the first addition of new field to the config 
> > > space in
> > 1.4-time frame, should add the cfgvq, and not follow the previous example.
> > > Because this is being thought through now, it is not at all hard for any 
> > > new
> > things to follow the guideline.
> > 
> > Really. Oh great, so there will be 3 ways to provision things!
> >
> Not sure how you concluded 3 ways...
> Provision is via single command via AQ by the owner device.

heh but what does this command get? if everything
device specific is in
config space it can just get config space layout exactly
and we have no extra work. if there's stuff not in config

[virtio-dev] RE: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Parav Pandit


> From: Michael S. Tsirkin 
> Sent: Wednesday, June 28, 2023 12:46 PM

> Actually, my preblem is exactly if there's no one single interface that can do
> everything.
>
Extended config space reading via single DMA interface is just enough.

> 
> Two interfaces that do the same might seem redundant but there could be good
> reasons for this, in this case legacy.
>
But legacy should not force it to newer interface. More below.
 
> See the difference?
> 

> > I didn't propose two interfaces. Let me explain again.
> >
> > I proposed,
> > a. all new fields to go via vq (single interface) b. all existing
> > fields to stay in cfg space (single interface) c. if one wants to
> > optimize optionally existing fields can utilize (same flexibility how you 
> > asked
> for PF and VF notification).
> >
> >
> > > We already need config space with provisioning.
> > Provisioning does provision features, config space and control parameters,
> msix vectors and more.
> > Provisioning does _not_ need config space.
> 
> I mean it needs to provision it.
> 
No matter there is config space of dma, a device to be provisioned.

> > > In fact e.g. on Linux we want drivers to keep doing virtio_cread()
> > > and all the DMA tricks should be hidden behind the scenes. You don't
> > > like it that config space is on-chip? Build an interface for it not
> > > to be on chip. There is simply no downside to that.
> > >
> > This is the ask to build software interface between driver and device to not
> demand on-chip config space more importantly _predictably_.
> 
> Absolutely. So we would say that driver SHOULD use the DMA interface if it is
> present.
> 
I am asking for driver MUST use the DMA interface for _extended_ config space.
Why?
For two reasons.

1. Because than device is 100% sure that it does not fulfill MMIO needs
2. Single interface to access extended config space, which is what you asked 
for.
3. Single driver code because there is single way to get it.


> > > I came back after a small break and I still see this discussion that
> > > keeps playing out as a big distraction.
> > > In particular where are you doing to store the things that are for DMA?
> > > In system RAM? We don't *have* anywhere in system RAM to store this,
> > > we will need an interface to allocate system RAM and pass it to
> > > device for this idea to work. So in 1.3 where we won't have this,
> > > there is much less of an excuse to use a vq.
> > To store what?
> > CVQ is already there to do GET and SET operation.
> 
> To store whatever value CVQ is requesting. Where is it going to come from?
>
It is going to come from the device memory that does not have hard requirements 
of MMIO accessibility.
For example slow flash memory.
 
> Asynchronous is just harder for software.  Look at the hoops virtio net has to
> jump through to do control path over cvq and weep. Compare to single liner
> virtio_cread.
> 
> I happen to maintain Linux drivers too, so I care.
> 
It is harder for first time if no such construct exists; and it is clearly not 
the case.
This patch is already adding CVQ get and set command.
Async behavior already built into the virtio spec.
Extending it for extended config space just make device more cheaper to build.
Software cost is paid only once at device init time just like CVQ, AQ and more.

> > >
> > > Please let people just focus on fixing config space instead of
> > > temporary cvq hacks.
> >
> > The ask is to have predictable sizing for existing defined config space 
> > field and
> not keep infinitely growing by two interfaces.
> 
> I don't know what "predictable sizing" is or why does it matter.
> 
Because when device has two ways to access config space, it always have to 
account and build the interface that, hey some driver will not use DMA.
Hence have it always in the MMIO accessible area.

> I get the value of reduced MMIO register map.
> I get that the only way to get that appears to be to pass values around using
> DMA.
>
Ok. great.
 
> I don't think we need to throw away config space - just use DMA to access it.
> 
> And "existing defined" here seems to be at a completely random point in time.
> If someone wants to have fields in MMIO, I see no reason not to.

This demands two ways of access and that conflicts with your point of desire to 
do single way.
I proposed single way, extended config space via DMA interface, that is really 
as simple as that.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [virtio-comment] Re: [PATCH v18] virtio-net: support inner header hash

2023-06-28 Thread Parav Pandit


> From: Michael S. Tsirkin 
> Sent: Wednesday, June 28, 2023 6:41 AM

> > > So just quickly add the new capability in the spec and then the
> > > number of linux releases that will have the new feature but not
> > > config command or whatever that is will be too small for vendors to care.
> > >
> > I didn't follow this suggestion.
> 
> It is very simple though. 1.3 has inner hash feature. Imagine that instead of
> endless flamewars rehashing same arguments, immediately post

For sure if you call this discussion a war, I didn't start it and I didn't 
delay it to this point either :)

Patch added GET and SET command, GET returns device static and dynamic value, 
without burdening the device in consistent way.
Will moving to config space, take away the GET command? No?

Then what did one gain other than extra complexity of additional config space? 
Just more device memory burning.

> 1.3 we vote on an interface to access config space through DMA (not sure it's 
> a
> VQ BTW but that's a separate discussion).  This should include a way to expose
> a subset of DMA features through MMIO, for compatibility.

I think you are missing the main point that I try to highlight but presumably 
it didn't come across. :(

Having an optional DMA interface does _not_ help device to optimize, because 
device must be support old drivers.

The proposal I explained in previous email is:
a. All new fields via vq
This means it is _guaranteed_ to be offchip and zero reason for backward 
compatibility, because there is no backward compat of non existing fields.

b. all existing fields stay in cfg space

c. Optionally existing fields can also be queried via cfg space

> If we are lucky guest support can land in the same Linux release that has 
> inner
> hash.  Drivers do not need to care: they just do virtio_cread and that is it.
>
virtio_cread for extended config space accessible only via DMA or VQ will 
surely be hidden inside th virtio_cread(). Easy.
 
If we agree on this approach, great, one additional field like this can be 
added to config space.

> So a vendor that builds device with inner hash, can expose inner hash only
> through DMA and not through MMIO.
> 
> I conclude that there's no reason to block this feature just because it uses 
> config
> space.
>
I must repeat that I am not blocking, I replied to way back on both the 
approaches in v14 or even before that.
And also explained part of the proposal.

This is why one should discuss the design and not keep asking for patches..
 
> 
> > > >
> > > > > A good implementation of virtio_cread can abstract that easily
> > > > > so we don't need to change drivers.
> > > >
> > > > There is no backward compat issue for the GET command being new.
> > >
> > > It's just a shortcut replacing what we really want.  As long as a
> > > shortcut is available people will keep using exactly that.  So I
> > > fully expect more proposals for such GET commands on the pretext
> > > that one is there so why not another one. Adding more tech debt for
> > > whoever finally gets around to building a config space access gateway.
> > >
> > Not really. as suggested, the first addition of new field to the config 
> > space in
> 1.4-time frame, should add the cfgvq, and not follow the previous example.
> > Because this is being thought through now, it is not at all hard for any new
> things to follow the guideline.
> 
> Really. Oh great, so there will be 3 ways to provision things!
>
Not sure how you concluded 3 ways...

Provision is via single command via AQ by the owner device.
New config fields via VQ of member device itself, again single way.
Existing config fields access via config space, single way.

> I have not seen this patch yet.  And how long will this take to materialize? 
> I don't
> believe all TC work must be blocked until this happens, or alternatively use 
> ad-
> hock hacks.
It is surely not the right way to ask for the patch when doing the design 
discussion and claiming that it is not ready.

> 
> I get it you want to save on chip memory. So work on a consistent soltion for
> this.
> 
> All config space accesses should go through DMA.
This is not predictable due to backward compat need.
Hence the ask is all *new* config space access go through DMA.

> Thus no special guidelines should be necessary, and drivers can just keep 
> doing
> virtio_cread like they always did.
>
This can be very easily achieved in sw by knowing extended config space offset 
to use DMA.
 
> To add to that, it will allow cheaper devices as some existing config space 
> will be
> able to move quickly.
This is possible only in those config when device side can know early enough, 
through a new feature bit and new registers for DMA, which will pay off for 
larger config space extended region.


I want to summarize my humble input:

1. all new extended config space fields to be accessed only via DMA/VQ by 
member device itself.
2. all existing fields to stay in config space without any interface change
3. 

[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 04:18:02PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, June 28, 2023 6:27 AM
> > 
> > On Wed, Jun 28, 2023 at 04:23:09AM +, Parav Pandit wrote:
> > >
> > > > From: Jason Wang 
> > > > Sent: Tuesday, June 27, 2023 11:46 PM
> > > >
> > > > Having it in the config space, then a type agnostic provisioning
> > > > through config space + feature bits just works fine.
> > > >
> > > Provisioning is far simpler thing to do in device specific way than 
> > > asking device
> > to store this value in onchip area which is rarely accessed.
> > 
> > I thought a lot about this over the weekend. I just do not see this working 
> > out,
> > sorry.  Two things is never easier than one. 
> 
> You proposed two things and making it mandatory.
> I am glad that you realized to do using one interface.
> 
> In different context, you proposed two interface for driver notification on 
> PF and/or VF BAR in separate thread for flexibility. I asked for one.
> Somehow you promoted flexibility (and also complexity that no vendor wanted), 
> and here you lean towards single interface...
> 
> So more explanation of the rationale will help to understand.

OK I will try.

Actually, my preblem is exactly if there's no one single
interface that can do everything.


Two interfaces that do the same might seem redundant but
there could be good reasons for this, in this case legacy.

See the difference?

> I didn't propose two interfaces. Let me explain again.
> 
> I proposed, 
> a. all new fields to go via vq (single interface)
> b. all existing fields to stay in cfg space (single interface)
> c. if one wants to optimize optionally existing fields can utilize (same 
> flexibility how you asked for PF and VF notification).
> 
> 
> > We already need config space with provisioning.  
> Provisioning does provision features, config space and control parameters, 
> msix vectors and more.
> Provisioning does _not_ need config space.

I mean it needs to provision it.

> > In fact e.g. on Linux we want drivers to keep doing virtio_cread()
> > and all the DMA tricks should be hidden behind the scenes. You don't like 
> > it that
> > config space is on-chip? Build an interface for it not to be on chip. There 
> > is
> > simply no downside to that.
> > 
> This is the ask to build software interface between driver and device to not 
> demand on-chip config space more importantly _predictably_.

Absolutely. So we would say that driver SHOULD use the DMA interface
if it is present.

> > I came back after a small break and I still see this discussion that keeps 
> > playing
> > out as a big distraction.
> > In particular where are you doing to store the things that are for DMA?
> > In system RAM? We don't *have* anywhere in system RAM to store this, we will
> > need an interface to allocate system RAM and pass it to device for this 
> > idea to
> > work. So in 1.3 where we won't have this, there is much less of an excuse 
> > to use
> > a vq.
> To store what?
> CVQ is already there to do GET and SET operation.

To store whatever value CVQ is requesting. Where is it going to
come from?


> > 
> > 
> > 
> > 
> > > There are many fields of many features for 1.4 are discussed including 
> > > this
> > one. All of these capabilities just cannot be stored in config space.
> > 
> > config space is synchronous interface, vq is asynchronous.  Setup is just 
> > too
> > messy to do asynchronously. So yes, config space.
> > 
> Huh, I disagree. The whole virtio spec has got the VQs all over and you say 
> that VQ is messy.
> Doesn't make any sense at all.

Asynchronous is just harder for software.  Look at the hoops virtio net
has to jump through to do control path over cvq and weep. Compare to
single liner virtio_cread.

I happen to maintain Linux drivers too, so I care.

> > 
> > Please let people just focus on fixing config space instead of temporary cvq
> > hacks.
> 
> The ask is to have predictable sizing for existing defined config space field 
> and not keep infinitely growing by two interfaces.

I don't know what "predictable sizing" is or why does it matter.

I get the value of reduced MMIO register map.
I get that the only way to get that appears to be to pass values
around using DMA.

I don't think we need to throw away config space - just use DMA to access it.

And "existing defined" here seems to be at a completely random point in time.
If someone wants to have fields in MMIO, I see no reason not to.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Wednesday, June 28, 2023 6:27 AM
> 
> On Wed, Jun 28, 2023 at 04:23:09AM +, Parav Pandit wrote:
> >
> > > From: Jason Wang 
> > > Sent: Tuesday, June 27, 2023 11:46 PM
> > >
> > > Having it in the config space, then a type agnostic provisioning
> > > through config space + feature bits just works fine.
> > >
> > Provisioning is far simpler thing to do in device specific way than asking 
> > device
> to store this value in onchip area which is rarely accessed.
> 
> I thought a lot about this over the weekend. I just do not see this working 
> out,
> sorry.  Two things is never easier than one. 

You proposed two things and making it mandatory.
I am glad that you realized to do using one interface.

In different context, you proposed two interface for driver notification on PF 
and/or VF BAR in separate thread for flexibility. I asked for one.
Somehow you promoted flexibility (and also complexity that no vendor wanted), 
and here you lean towards single interface...

So more explanation of the rationale will help to understand.

I didn't propose two interfaces. Let me explain again.

I proposed, 
a. all new fields to go via vq (single interface)
b. all existing fields to stay in cfg space (single interface)
c. if one wants to optimize optionally existing fields can utilize (same 
flexibility how you asked for PF and VF notification).


> We already need config space with provisioning.  
Provisioning does provision features, config space and control parameters, msix 
vectors and more.
Provisioning does _not_ need config space.

> In fact e.g. on Linux we want drivers to keep doing virtio_cread()
> and all the DMA tricks should be hidden behind the scenes. You don't like it 
> that
> config space is on-chip? Build an interface for it not to be on chip. There is
> simply no downside to that.
> 
This is the ask to build software interface between driver and device to not 
demand on-chip config space more importantly _predictably_.

> I came back after a small break and I still see this discussion that keeps 
> playing
> out as a big distraction.
> In particular where are you doing to store the things that are for DMA?
> In system RAM? We don't *have* anywhere in system RAM to store this, we will
> need an interface to allocate system RAM and pass it to device for this idea 
> to
> work. So in 1.3 where we won't have this, there is much less of an excuse to 
> use
> a vq.
To store what?
CVQ is already there to do GET and SET operation.

> 
> 
> 
> 
> > There are many fields of many features for 1.4 are discussed including this
> one. All of these capabilities just cannot be stored in config space.
> 
> config space is synchronous interface, vq is asynchronous.  Setup is just too
> messy to do asynchronously. So yes, config space.
> 
Huh, I disagree. The whole virtio spec has got the VQs all over and you say 
that VQ is messy.
Doesn't make any sense at all.

> 
> Please let people just focus on fixing config space instead of temporary cvq
> hacks.

The ask is to have predictable sizing for existing defined config space field 
and not keep infinitely growing by two interfaces.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Parav Pandit


> From: Jason Wang 
> Sent: Wednesday, June 28, 2023 1:38 AM

[..]
> > Provisioning is far simpler thing to do in device specific way than asking 
> > device
> to store this value in onchip area which is rarely accessed.
> 
> Are you suggesting to not place any new fields in the config space?
> 
Yes.

> struct virtio_net_config {
> u8 mac[6];
> le16 status;
> le16 max_virtqueue_pairs;
> le16 mtu;
> le32 speed;
> u8 duplex;
> u8 rss_max_key_size;
> le16 rss_max_indirection_table_length;
> le32 supported_hash_types;
> };
> 
> Which of the above do you think can be accessed frequently and which part of
> the spec says it must be stored in the onchip area?
> 
Most are not accessed frequently.
The fact that they are in MMIO a device needs to place in a memory with tight 
latency budget.
Spec is not going to talk on onchip area, it is the reflection of spec that 
forces certain inefficient implementation .


[virtio-dev] Re: [virtio-comment] [RFC PATCH] admin-queue: bind the group member to the device

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 02:06:32PM +0800, Xuan Zhuo wrote:
> On Wed, 28 Jun 2023 10:49:45 +0800, Jason Wang  wrote:
> > On Tue, Jun 27, 2023 at 6:54 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Tue, 27 Jun 2023 17:00:06 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Jun 27, 2023 at 4:28 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > >
> > > > > Thanks Parav for pointing it out. We may have some gaps on the case.
> > > > >
> > > > > Let me introduce our case, which I think it is simple and should be 
> > > > > easy to
> > > > > understand.
> > > > >
> > > > > First, the user (customer) purchased a bare metal machine.
> > > > >
> > > > > ## Bare metal machine
> > > > >
> > > > > Let me briefly explain the characteristics of a bare metal machine. 
> > > > > It is not a
> > > > > virtual machine, it is a physical machine, and the difference between 
> > > > > it and a
> > > > > general physical machine is that its PCI is connected to a device 
> > > > > similar to a
> > > > > DPU. This DPU provides devices such as virtio-blk/net to the host 
> > > > > through PCI.
> > > > > These devices are managed by the vendor, and must be created and 
> > > > > purchased
> > > > > on the vendor's management platform.
> > > > >
> > > > > ## DPU
> > > > >
> > > > > There is a software implementation in the DPU, which will respond to 
> > > > > PCI
> > > > > operations. But as mentioned above, resources such as network cards 
> > > > > must be
> > > > > purchased and created before they can exist. So users can create VF, 
> > > > > which is
> > > > > just a pci-level operation, but there may not be a corresponding 
> > > > > backend.
> > > > >
> > > > > ## Management Platform
> > > > >
> > > > > The creation and configuration of devices is realized on the 
> > > > > management
> > > > > platform.
> > > > >
> > > > > After the user completed the purchase on the management platform 
> > > > > (this is an
> > > > > independent platform provided by the vendor and has nothing to do with
> > > > > virtio), then there will be a corresponding device implementation in 
> > > > > the DPU.
> > > > > This includes some user configurations, available bandwidth resources 
> > > > > and other
> > > > > information.
> > > > >
> > > > > ## Usage
> > > > >
> > > > > Since the user is directly on the HOST, the user can create VMs, 
> > > > > passthrough PF
> > > > > or VF into the VM. Or users can create a large number of dockers, all 
> > > > > of which
> > > > > use a separate virtio-net device for performance.
> > > > >
> > > > > The reason why users use vf is that we need to use a large number of 
> > > > > virtio-net
> > > > > devices. This number reaches 1k+.
> > > > >
> > > > > Based on this scenario, we need to bind vf to the backend device. 
> > > > > Because, we
> > > > > cannot automatically complete the creation of the virtio-net backend 
> > > > > device when
> > > > > the user creates a vf.
> > > > >
> > > > > ## Migration
> > > > >
> > > > > In addition, let's consider another scenario of migration. If a vm is 
> > > > > migrated
> > > > > from another host, of course its corresponding virtio device is also 
> > > > > migrated to
> > > > > the DPU. At this time, our newly created vf can only be used by the 
> > > > > vm after it
> > > > > is bound to the migrated device. We do not want this vf to be a brand 
> > > > > new
> > > > > device.
> > > > >
> > > > > ## Abstraction
> > > > >
> > > > > So, this is how I understand the process of creating vf:
> > > > >
> > > > > 1. Create a PCI VF, at this time there may be no backend virtio 
> > > > > device, or there
> > > > > is only a default backend. It does not fully meet our 
> > > > > expectations.
> > > > > 2. Create device or migrate device
> > > > > 3. Bind the backend virtio device to the vf
> > > >
> > > > 3) should come before 2)?
> > > >
> > > > Who is going to do 3) btw, is it the user? If yes, for example, if a
> > > > user wants another 4 queue virtio-net devices, after purchase, how
> > > > does the user know its id?
> > >
> > > Got the id from the management platform.
> >
> > So it can do the binding via that management platform which this
> > became a cloud vendor specific interface.
> 
> In our scenario, this is bound by the user using this id and vf id in the os.
> 
> >
> > >
> > > >
> > > > >
> > > > > In most scenarios, the first step may be enough. We can make some 
> > > > > fine-tuning on
> > > > > this default device, such as modifying its mac. In the future, we can 
> > > > > use admin
> > > > > queue to modify its msix vector and other configurations.
> > > > >
> > > > > But we should allow, we bind a backend virtio device to a certain vf. 
> > > > > This is
> > > > > useful for live migration and virtio devices with special 
> > > > > configurations.
> > > >
> > > > All of these could be addressed if a dynamic provisioning model is
> > > > implemented (SIOV or transport virtqueue). Trying to have a workaround
> > > > in SR-IOV might be tricky.
> > >
> > 

[virtio-dev] Re: [virtio-comment] [RFC PATCH] admin-queue: bind the group member to the device

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 10:21:04AM +0800, Xuan Zhuo wrote:
> On Tue, 27 Jun 2023 12:02:41 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Tue, Jun 27, 2023 at 04:23:05PM +0800, Xuan Zhuo wrote:
> > > So, this is how I understand the process of creating vf:
> > >
> > > 1. Create a PCI VF, at this time there may be no backend virtio device, 
> > > or there
> > > is only a default backend. It does not fully meet our expectations.
> > > 2. Create device or migrate device
> > > 3. Bind the backend virtio device to the vf
> >
> > I can see this making sense as a feature bit that says VFs are not
> > initialized by default and must first be setup through an admin command.
> > This will likely need to be a feature bit because it's changing
> > behaviour outside of admin commands.
> >
> > Then, we can have:
> >
> > ADMIN_SETUP VF#
> > ADMIN_CLEANUP VF#
> >
> > I like this because this generalizes CREATE/DESTROY that SIOV guys proposed.
> 
> Great!!
> 
> >
> >
> > Why do we need an id as a level of indirection though? What is wrong
> > with just using VF# directly?
> 
> I think VF# is ok. I also need to use it. But we need an ID for virtio device
> not the transport(PF, VF).
> 
> What I want to emphasize is that PCI(pf or vf) is a transport, it is only used
> to connect the virtio driver and the virtio device. Right?
> 
> The virtio device does not necessarily exist depending on PCI. For example, a
> virtio device is migrated from another DPU, and it is not associated with any
> PCI. What I have always wanted to say is that this device(not PCI) must have 
> its
> own ID, which has nothing to do with the transport.
> 
> Now we want to use this migrated device and connect it to the corresponding
> vm (migrated from the same host). We can passthrough vf to this vm. But how 
> do I
> tell our DPU to bind this migrated device with this vf?
> We can specify the VF by the VF#, but how can we specify the virtio device?
> Maybe there are two migrated virtio device.
> 
> Maybe we should compare it to the case of using PF directly before. The 
> biggest
> difference here is that PF is directly hot-plugged by the DPU, so when a
> user(custom) buys a virtio-net device, the DPU directly inserts a new PCI 
> device
> on the host. Now the vf is created by the user, and only the user knows how to
> use each one. We cannot directly bind the device purchased or migrated by the
> user to a certain vf. We need the user in the host to bind the vf(transport) 
> to
> a certain virtio device.
> 
> Thanks.
> 
> 

Maybe I didn't have enough coffee today but I can't figure out what all
this means :( For example what does "exist depending" mean?  What does
"device is migrated" mean?  What does it mean to be "migrated from the
same host"? What is "any PCI"? E.g. I know people did vm migration
moving virtio from a hardware device to a software
implementation. Is that "not associated with any PCI" ?
What is "user(custom)"? how is "the vf is created by the user"?
what does it mean to "bind the device .. to a certain vf"?
It looks like Parav understand what's going on though, so
maybe it's my fault.

But fundamentally, the problem is that the spec patch doesn't do
anything useful by itself, it relies on some out of spec interface to
make these id values make sense. So the TC is reduced to guessing: we
could just trust you that it's useful somehow and at the same time
serves the purpose whatever it is.

But it would be better if instead of trying to explain what that does in
vague terms, you post a spec patch that allows doing whatever needs
doing for these IDs to make sense through e.g. admin commands.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] RE: [virtio-comment] [RFC PATCH] admin-queue: bind the group member to the device

2023-06-28 Thread Parav Pandit



> From: Xuan Zhuo 
> Sent: Tuesday, June 27, 2023 10:21 PM

> > I can see this making sense as a feature bit that says VFs are not
> > initialized by default and must first be setup through an admin command.
> > This will likely need to be a feature bit because it's changing
> > behaviour outside of admin commands.
> >
> > Then, we can have:
> >
> > ADMIN_SETUP VF#
> > ADMIN_CLEANUP VF#
> >
> > I like this because this generalizes CREATE/DESTROY that SIOV guys proposed.
> 

What does this command actually do or expected do on the device?

> Great!!
> 
> >
> >
> > Why do we need an id as a level of indirection though? What is wrong
> > with just using VF# directly?
> 
> I think VF# is ok. I also need to use it. But we need an ID for virtio device 
> not the
> transport(PF, VF).
> 
> What I want to emphasize is that PCI(pf or vf) is a transport, it is only 
> used to
> connect the virtio driver and the virtio device. Right?
> 
> The virtio device does not necessarily exist depending on PCI. For example, a
> virtio device is migrated from another DPU, and it is not associated with any 
> PCI.
> What I have always wanted to say is that this device(not PCI) must have its 
> own
> ID, which has nothing to do with the transport.
> 
A virtio device can have the id visible to self and visible to group owner 
device.

> Now we want to use this migrated device and connect it to the corresponding
> vm (migrated from the same host). We can passthrough vf to this vm. But how
> do I tell our DPU to bind this migrated device with this vf?

When a virtio device state is set on a specific VF on the compute side (not on 
the dpu side),
This directly indicates to the dpu side, which virtio device is attached to 
which VF.

> We can specify the VF by the VF#, but how can we specify the virtio device?
> Maybe there are two migrated virtio device.
>
 virtio device state setting itself will contain the device identifier.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v18] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Thu, Jun 22, 2023 at 08:27:48PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Thursday, June 22, 2023 3:02 PM
> 
> 
> > > Existing fields used by existing drivers must be in MMIO.
> > > Device does not have a choice.
> > 
> > They can if they like mask some less important features reducing the 
> > footprint.
> How do to that? How would device know to not have existing fields on MMIO?

As you said yourself, feature bits also have cost. They must be
accessed with DMA too, not through MMIO.
This set of features can have more bits than the one in MMIO.


> > This will be the cost/benefit analysis each vendor does separately.
> > 
> > Case in point, if inner hash support is in 1.3, and linux drivers don't 
> > really use
> > that at all for a year or two because it's only used by alibaba in their 
> > stack, and
> > by the time linux starts using them it also supports the new commands, then
> > practically devices do not care and can just mask the feature bit in MMIO.
> > 
> Once it lands to config space, it becomes existing fields.

What's in which version of the spec is immaterial.  If all guests using
it also support config space through DMA then it does not matter that
technically there was a spec version which had inner hash but not DMA.

> More below.
> > 
> 
> > > > If they want to cut at 1.3 time, they can, but they also can cut it
> > > > at 1.2 time, this means some features will not be accessible through
> > > > MMIO only through the new commands.
> > > >
> > > New commands services new fields by newer driver.
> > > Newer driver can call new command to get new and old fields both.
> > 
> > yes.
> > 
> > > So 1.3 and 1.4 devices cannot optimize for older drivers.
> > 
> > 
> > I don't know what this means.
> > 
> I guess next below question answers it.

I still don't know what did you mean by "So 1.3 and 1.4 devices cannot
optimize for older drivers". If it's immaterial, fine.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v18] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Thu, Jun 22, 2023 at 05:58:29PM +, Parav Pandit wrote:
> > > > > > So what is the solution proposed for this?
> > > > > >
> > > > > 1. Query member device capabilities via admin command
> > > >
> > > > But that's not 1.3 material.
> > > >
> > > > > > Yes the current migration is broken in many ways but that's what
> > > > > > we have. Let's build something better sure but that is not 1.3 
> > > > > > material.
> > > > >
> > > > > True, it is not 1.3 material, hence the proposal was to have the GET
> > command.
> > > > > Once/if we reach agreement that no new fields to be added to
> > > > > config space
> > > > starting 1.4 and should be queried using non intercepted cfgvq, it
> > > > makes sense to let this go in cfg space.
> > > > > Else GET command seems the elegant and right approach.
> > > >
> > > > I expect no such agreement at all. Instead, I expect that we'll have
> > > > an alternative way to access config space. guest virtio core then
> > > > needs to learn both ways, and devices can support one or both.
> > > >
> > > Yeah, we disagree.
> > > Because alternative way that you propose is not predictable way to build 
> > > the
> > device efficiently.
> > > It always needs to account for old driver to support.
> > > This is clearly sub-optimal as the capabilities grow.
> > 
> > So just quickly add the new capability in the spec and then the number of 
> > linux
> > releases that will have the new feature but not config command or whatever
> > that is will be too small for vendors to care.
> > 
> I didn't follow this suggestion.

It is very simple though. 1.3 has inner hash feature. Imagine that
instead of endless flamewars rehashing same arguments, immediately post
1.3 we vote on an interface to access config space through DMA (not sure
it's a VQ BTW but that's a separate discussion).  This should include a
way to expose a subset of DMA features through MMIO, for compatibility.
If we are lucky guest support can land in the same Linux release that
has inner hash.  Drivers do not need to care: they just do virtio_cread
and that is it.

So a vendor that builds device with inner hash, can expose inner hash
only through DMA and not through MMIO.

I conclude that there's no reason to block this feature just
because it uses config space.





> > >
> > > > A good implementation of virtio_cread can abstract that easily so we
> > > > don't need to change drivers.
> > >
> > > There is no backward compat issue for the GET command being new.
> > 
> > It's just a shortcut replacing what we really want.  As long as a shortcut 
> > is
> > available people will keep using exactly that.  So I fully expect more 
> > proposals
> > for such GET commands on the pretext that one is there so why not another
> > one. Adding more tech debt for whoever finally gets around to building a 
> > config
> > space access gateway.
> >
> Not really. as suggested, the first addition of new field to the config space 
> in 1.4-time frame, should add the cfgvq, and not follow the previous example.
> Because this is being thought through now, it is not at all hard for any new 
> things to follow the guideline.

Really. Oh great, so there will be 3 ways to provision things!

I have not seen this patch yet.  And how long will this take to
materialize? I don't believe all TC work must be blocked until this
happens, or alternatively use ad-hock hacks.

I get it you want to save on chip memory. So work on a consistent
soltion for this.

All config space accesses should go through DMA.
Thus no special guidelines should be necessary, and drivers
can just keep doing virtio_cread like they always did.

To add to that, it will allow cheaper devices as
some existing config space will be able to move
quickly.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 04:23:09AM +, Parav Pandit wrote:
> 
> > From: Jason Wang 
> > Sent: Tuesday, June 27, 2023 11:46 PM
> > 
> > Having it in the config space, then a type agnostic provisioning through 
> > config
> > space + feature bits just works fine.
> >
> Provisioning is far simpler thing to do in device specific way than asking 
> device to store this value in onchip area which is rarely accessed.

I thought a lot about this over the weekend. I just do not see this
working out, sorry.  Two things is never easier than one. We already
need config space with provisioning.  In fact e.g. on Linux we want
drivers to keep doing virtio_cread() and all the DMA tricks should be
hidden behind the scenes. You don't like it that config space is
on-chip? Build an interface for it not to be on chip. There is simply
no downside to that.

I came back after a small break and I still see this discussion
that keeps playing out as a big distraction.
In particular where are you doing to store the things that are for DMA?
In system RAM? We don't *have* anywhere in system RAM to store this,
we will need an interface to allocate system RAM and pass it to device
for this idea to work. So in 1.3 where we won't have this, there is
much less of an excuse to use a vq.




> There are many fields of many features for 1.4 are discussed including this 
> one. All of these capabilities just cannot be stored in config space.

config space is synchronous interface, vq is asynchronous.  Setup
is just too messy to do asynchronously. So yes, config space.


Please let people just focus on fixing config space instead of
temporary cvq hacks.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Michael S. Tsirkin
On Wed, Jun 28, 2023 at 11:46:22AM +0800, Jason Wang wrote:
> On Wed, Jun 28, 2023 at 12:35 AM Heng Qi  wrote:
> >
> > 1. Currently, a received encapsulated packet has an outer and an inner 
> > header, but
> > the virtio device is unable to calculate the hash for the inner header. The 
> > same
> > flow can traverse through different tunnels, resulting in the encapsulated
> > packets being spread across multiple receive queues (refer to the figure 
> > below).
> > However, in certain scenarios, we may need to direct these encapsulated 
> > packets of
> > the same flow to a single receive queue. This facilitates the processing
> > of the flow by the same CPU to improve performance (warm caches, less 
> > locking, etc.).
> >
> >client1client2
> >   |+---+ |
> >   +--->|tunnels|<+
> >+---+
> >   |  |
> >   v  v
> >   +-+
> >   | monitoring host |
> >   +-+
> >
> > To achieve this, the device can calculate a symmetric hash based on the 
> > inner headers
> > of the same flow.
> >
> > 2. For legacy systems, they may lack entropy fields which modern protocols 
> > have in
> > the outer header, resulting in multiple flows with the same outer header but
> > different inner headers being directed to the same receive queue. This 
> > results in
> > poor receive performance.
> >
> > To address this limitation, inner header hash can be used to enable the 
> > device to advertise
> > the capability to calculate the hash for the inner packet, regaining better 
> > receive performance.
> >
> > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/173
> > Signed-off-by: Heng Qi 
> > Reviewed-by: Xuan Zhuo 
> > Reviewed-by: Parav Pandit 
> > ---
> > v18->v19:
> > 1. Have a single structure instead of two. @Michael S . Tsirkin
> > 2. Some small rewrites. @Michael S . Tsirkin
> > 3. Rebase to master.
> >
> > v17->v18:
> > 1. Some rewording suggestions from Michael (Thanks!).
> > 2. Use 0 to disable inner header hash and remove
> >VIRTIO_NET_HASH_TUNNEL_TYPE_NONE.
> > v16->v17:
> > 1. Some small rewrites. @Parav Pandit
> > 2. Add Parav's Reviewed-by tag (Thanks!).
> >
> > v15->v16:
> > 1. Remove the hash_option. In order to delimit the inner header 
> > hash and RSS
> >configuration, the ability to configure the outer src udp port 
> > hash is given
> >to RSS. This is orthogonal to inner header hash, which will be 
> > done in the
> >RSS capability extension topic (considered as an RSS extension 
> > together
> >with the symmetric toeplitz hash algorithm, etc.). @Parav Pandit 
> > @Michael S . Tsirkin
> > 2. Fix a 'field' typo. @Parav Pandit
> >
> > v14->v15:
> > 1. Add tunnel hash option suggested by @Michael S . Tsirkin
> > 2. Adjust some descriptions.
> >
> > v13->v14:
> > 1. Move supported_hash_tunnel_types from config space into cvq 
> > command. @Parav Pandit
> 
> I may miss some discussions, but this complicates the provisioning a lot.
> 
> Having it in the config space, then a type agnostic provisioning
> through config space + feature bits just works fine.
> 
> If we move it only via cvq, we need device specific provisioning interface.
> 
> Thanks

Yea that's what I said too. Debugging too.  I think we should build a
consistent solution that allows accessing config space through DMA,
separately from this effort.  Parav do you think you can live with this
approach so this specific proposal can move forward?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev][PATCH 2/2] virtio-spi: add the device specification

2023-06-28 Thread Haixu Cui

Hi @jrreinh...@google.com,
Thank you very much for your helpful comments.

I missed the delay and cs_change_delay parameters. I will add both 
of them, although cs_change_delay can not be set from userspace, but can 
be set in kernel space.



For CS delays such as PRE_DELAY/POST_DELAY, it seems that they are 
defined in structure spi_device:


@cs_setup: delay to be introduced by the controller after CS is 
asserted.


@cs_hold: delay to be introduced by the controller before CS is 
deasserted.


@cs_inactive: delay to be introduced by the controller after CS is
deasserted. If @cs_change_delay is used from @spi_transfer, then the
two delays will be added up.

They show the SPI controller ability to control the CS 
assertion/deassertion timing and should not be changed for each transfer 
(because thay can be updated by setting structure spi_transfer or 
structure spi_ioc_transfer). I think it better to define these parameter 
in host OS rather than in guest OS since it's the host OS to operate the 
hardware SPI controller directly. Besides, it can also avoid passing the 
same values from guest to host time after time.


What's your opinion on this topic? Again, thank you very much.

Best Regards
Haixu Cui

On 6/28/2023 1:06 AM, jrreinh...@google.com wrote:

Hi Haixu,


+The \field{word_delay} defines how long to wait between words within
one SPI transfer,
+in ns unit.


I'm a little surprised to see a word_delay but no frame_delay or
transfer_delay.

For example, many SPI peripherals require a delay after CS is asserted,
but before the first SCLK edge, allowing them to prepare to send data.
(E.g. an ADC might begin taking a sample.)


The linux struct spi_transfer has three delay fields:

* @cs_change_delay: delay between cs deassert and assert when
*  @cs_change is set and @spi_transfer is not the last in
*  @spi_message
* @delay: delay to be introduced after this transfer before
*    (optionally) changing the chipselect status, then starting the
*    next transfer or completing this @spi_message.
* @word_delay: inter word delay to be introduced after each word size
*    (set by bits_per_word) transmission.

The userspace spidev.h API has only two:

* @delay_usecs: If nonzero, how long to delay after the last bit
*   transfer before optionally deselecting the device before the
*   next transfer.
* @word_delay_usecs: If nonzero, how long to wait between words within
*   one transfer. This property needs explicit support in the SPI
*   controller, otherwise it is silently ignored.

The NXP i.MX RT500 SPI (master) controller, for example, has four
configurable delays:

- PRE_DELAY: After CS assertion, before first SCLK edge.
- POST_DELAY: After a transfer, before CS deassertion.
- FRAME_DELAY: Between frames (which are arbitrarily defined by
   software).
- TRANSFER_DELAY: Minimum CS deassertion time between transfers.

The NVIDIA Tegra114 SPI controller has:

- nvidia,cs-setup-clk-count: CS setup timing parameter.
- nvidia,cs-hold-clk-count: CS hold timing parameter.


I understand that accurately representing all possible delays might be
difficult or futile, but I'm curious to understand why, of all the
possible delays, inter-word delay was chosen for inclusion.

In a microcontroller, delays around CS (de)assertion can be customized
by using a GPIO -- rather than the hardware SPI block -- to drive the CS
signal. This way, delays can be inserted where needed. To do so with a
virtualized SPI controller might prove difficult however, as the virtual
channel carrying a CS GPIO signal might not be synchronized to the
channel carrying the SPI data.

Curious to hear your thoughts.

Thanks,
Jonathon Reinhart


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org