[virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-22 Thread Heng Qi
On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > 1) Add a feature bit to the virtio specification to tell the sender that a 
> > fully
> > csumed packet must be sent.
> 
> Who is the sender in this picture? The driver?

The device or the driver.

When the device is hw, the sender is more likely to be a device.
When the device is sw, the sender can be a device or a driver.

But in general, this feature is inclined to constrain the behavior of the 
device and
the driver from the receiving side.

For example: 
VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you must send 
me a fully csumed packet.

Then the specific implementation can be

(1) the sender sends a fully csumed packet;
(2) the receiver receives a CHECKSUM_PARTIAL packet, and the device helps 
calculate the fully csum
(because the two parties in the communication are located on the same host, 
the packet is trusted.).

In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver will no 
longer receive any packets marked CHECKSUM_PARTIAL.

Thanks.

> 
> -- 
> MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v2 1/2] transport-pci: Introduce legacy registers access commands

2023-05-22 Thread Michael S. Tsirkin
On Mon, May 22, 2023 at 09:05:13PM +, Parav Pandit wrote:
> > How was I supposed to raise this issue on v1 if it was not there?
> > Stop arguing in circles and things will move forward.
> v1 version clearly had AQ commands.
> 
> Please revisit your comments.

Just for fun, I did look it up, and there I asked:
What is this notify offset? It's never explained.

so yes, it was there but without explanation, only technically.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v2 1/2] transport-pci: Introduce legacy registers access commands

2023-05-22 Thread Michael S. Tsirkin
On Mon, May 22, 2023 at 09:05:13PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Monday, May 22, 2023 4:07 PM
> > 
> > On Sun, May 21, 2023 at 02:44:03PM +, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin 
> > > > Sent: Sunday, May 21, 2023 10:33 AM
> > >
> > > > > Yet you initiate same discussion point that we already discussed
> > > > > again after
> > > > summarizing.
> > > > > A driver is not attached to two devices.
> > > > > A driver is attached to a single device.
> > > >
> > > > And that device is the owner no? to send commands?
> > > >
> > > Not for legacy registers access as discussed before.
> > 
> > Now I am lost. legacy register access is on top of aq which sends commands
> > through pf.  
> 
> > how are you going to send commands without attaching a driver, I
> > don't know.
> 
> A VF driver can send command to its parent PF driver using well defined 
> kernel API.
> There is no need to attach PF driver.
> Upper and lower devices are common concept.
> > > Not limited to QEMU.
> > > A driver will be able to do this as well.
> > 
> > No, on some OS-es (windows, iokit, more) a single driver can't bind to many
> > devices without a lot of pain.  
> There is no need to bind. A well-defined kernel API is enough.
> 
> It's been while I worked on iokit, but iokit has similar kernel APIs.
> 
> It is very exciting to think that one has built iokit based hypervisor and 
> hosting PCI accelerated device and VMs in it.
> Last time when I checked, I was told that virtio driver is not present in the 
> latest OS, not sure how true it is.

iokit is from my previous work experience not virtio.
For virtio we had a lot of trouble with hacks like this on windows.
That's an important platform.

> > If you are mapping VF BAR, this has to be done by
> > VF driver, and please just avoid pain for everyone by exposing the necessary
> > info through the VF itself.
> > A capability seems perfect for this.
> > 
> Why not the MMIO region as proposed in v0?

Are you asking what were issues with v0? The issue with v0 was not MMIO
as such at least for me.  The issue was that it was not giving us any
way to work around bugs in the legacy interface. AQ was a fix for that.


> Capability based read/write requires extra redirection, it is less optimal 
> than MMIO.
> 
> > An alternative is not to map VF BAR at all, steal space from PF BAR instead.
> Come an iokit will be able to access this PF BAR without binding to it?

I don't get what you are saying here.  You have to bind a PF driver.
Otherwise you can not send AQ command. That driver can map PF BAR into
applications.


> > Guest is not accessing this anyway. So I really do not see why not from 
> > software
> > point of view. There's a hardware reason? Could you talk to hardware guys
> > about this? You objected to AQ too then hardware guys told you it is not a 
> > big
> > deal.
> I didn't object to AQ on hw reason.
> I explained AQ is not the best compared to MMIO because it is slow compared 
> to MMIO by law of physics as it needs to move more bits than MMIO.
> And so, the capability based tunneling too is slow to MMIO.
> > 
> > > > So we have legacy emulation send commands to VF or to PF.  Okay. But
> > > > let us avoid the need for VF driver to send commands to PF to 
> > > > initialize.
> > > > Just get all information it needs from VF itself.
> > > >
> > > >
> > > > Maybe it's a good idea to reuse existing notification capability, or
> > > > maybe a new one, but let's avoid making VF driver depend on PF
> > commands.
> > > >
> > > We agreed in v1 on Jason's suggestion to have the AQ command and yet
> > > you and Jason hinder this in v2 with this exact repeated question.
> > > Lets please avoid this and move forward.
> > 
> > How was I supposed to raise this issue on v1 if it was not there?
> > Stop arguing in circles and things will move forward.
> v1 version clearly had AQ commands.
> 
> Please revisit your comments.
> v0 -> you suggested:
>  - "why not AQ, like transport vq, it can coalesce partial mac writes".
> 
> v1-> over AQ, you suggested:
>  - to "drop the size"
> - summarized to split 2 commands to 4 commands to split device and config 
> area.
> 
> After that summary, you propose, why not self-contained in VF itself? -> 
> circle back to v0 method.
> (Instead of mmio, suggesting capability, but not explaining why mmio is not 
> fine. Which is better than cap).
> 
> Then now you propose "have BAR in the PF which is owned by other PF driver", 
> with least reasoning.
> And it convolutes with your suggestion of PF driver access...
> 
> So, I am bit lost on your circular proposal, or it was just a probe 
> question...
> Not sure.

You did move things to AQ however not completely.  You kept notification
in MMIO. If everything was in AQ we would be done as far as I am
concerned (though Jason still does not like duplication of functionality
with transport vq, but that is a separate issue).

As notification is in VF 

[virtio-dev] RE: [virtio-comment] RE: [PATCH v2 1/2] transport-pci: Introduce legacy registers access commands

2023-05-22 Thread Parav Pandit


> From: Michael S. Tsirkin 
> Sent: Monday, May 22, 2023 4:07 PM
> 
> On Sun, May 21, 2023 at 02:44:03PM +, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Sunday, May 21, 2023 10:33 AM
> >
> > > > Yet you initiate same discussion point that we already discussed
> > > > again after
> > > summarizing.
> > > > A driver is not attached to two devices.
> > > > A driver is attached to a single device.
> > >
> > > And that device is the owner no? to send commands?
> > >
> > Not for legacy registers access as discussed before.
> 
> Now I am lost. legacy register access is on top of aq which sends commands
> through pf.  

> how are you going to send commands without attaching a driver, I
> don't know.

A VF driver can send command to its parent PF driver using well defined kernel 
API.
There is no need to attach PF driver.
Upper and lower devices are common concept.

> > Not limited to QEMU.
> > A driver will be able to do this as well.
> 
> No, on some OS-es (windows, iokit, more) a single driver can't bind to many
> devices without a lot of pain.  
There is no need to bind. A well-defined kernel API is enough.

It's been while I worked on iokit, but iokit has similar kernel APIs.

It is very exciting to think that one has built iokit based hypervisor and 
hosting PCI accelerated device and VMs in it.
Last time when I checked, I was told that virtio driver is not present in the 
latest OS, not sure how true it is.

> If you are mapping VF BAR, this has to be done by
> VF driver, and please just avoid pain for everyone by exposing the necessary
> info through the VF itself.
> A capability seems perfect for this.
> 
Why not the MMIO region as proposed in v0?
Capability based read/write requires extra redirection, it is less optimal than 
MMIO.

> An alternative is not to map VF BAR at all, steal space from PF BAR instead.
Come an iokit will be able to access this PF BAR without binding to it?

> Guest is not accessing this anyway. So I really do not see why not from 
> software
> point of view. There's a hardware reason? Could you talk to hardware guys
> about this? You objected to AQ too then hardware guys told you it is not a big
> deal.
I didn't object to AQ on hw reason.
I explained AQ is not the best compared to MMIO because it is slow compared to 
MMIO by law of physics as it needs to move more bits than MMIO.
And so, the capability based tunneling too is slow to MMIO.

> 
> > > So we have legacy emulation send commands to VF or to PF.  Okay. But
> > > let us avoid the need for VF driver to send commands to PF to initialize.
> > > Just get all information it needs from VF itself.
> > >
> > >
> > > Maybe it's a good idea to reuse existing notification capability, or
> > > maybe a new one, but let's avoid making VF driver depend on PF
> commands.
> > >
> > We agreed in v1 on Jason's suggestion to have the AQ command and yet
> > you and Jason hinder this in v2 with this exact repeated question.
> > Lets please avoid this and move forward.
> 
> How was I supposed to raise this issue on v1 if it was not there?
> Stop arguing in circles and things will move forward.
v1 version clearly had AQ commands.

Please revisit your comments.
v0 -> you suggested:
 - "why not AQ, like transport vq, it can coalesce partial mac writes".

v1-> over AQ, you suggested:
 - to "drop the size"
- summarized to split 2 commands to 4 commands to split device and config area.

After that summary, you propose, why not self-contained in VF itself? -> circle 
back to v0 method.
(Instead of mmio, suggesting capability, but not explaining why mmio is not 
fine. Which is better than cap).

Then now you propose "have BAR in the PF which is owned by other PF driver", 
with least reasoning.
And it convolutes with your suggestion of PF driver access...

So, I am bit lost on your circular proposal, or it was just a probe question...
Not sure.

> 
> If we have trouble converging on the notification thing, how about we make
> progress without it for now?  Merge a slow version that sends kicks through 
> aq,
> then work on the best way to make notifications faster, separately.
> 
Sure, we can differ, but the VF has the notification BAR to utilize.
So why not use it?

> 
> > > > We have already these design choices and tradeoff in v0 and v1, it
> > > > doesn't fit
> > > the requirements.
> > >
> >
> > > So, I am saying one model is small driver for VF and a big one for PF.
> > > And to keep the VF driver simple, it should get information simply
> > > from config space capability.
> >
> > VF driver is small that does usual vfio passthrough work.
> > PF driver implement AQ for variety of use cases that we listed in the AQ
> cover letter.
> > VF driver implements 5 AQ commands that you suggested to split from 2 to 4.
> 
> VF(member) driver can't implement AQ commands at all. They are sent to
> PF(owner) thus by PF(owner) driver.  In virt configs, VMM can trap legacy 
> access
> and talk to owner driver to send them.  It can 

[virtio-dev] Re: [virtio-comment] RE: [PATCH v2 1/2] transport-pci: Introduce legacy registers access commands

2023-05-22 Thread Michael S. Tsirkin
On Sun, May 21, 2023 at 02:44:03PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Sunday, May 21, 2023 10:33 AM
> 
> > > Yet you initiate same discussion point that we already discussed again 
> > > after
> > summarizing.
> > > A driver is not attached to two devices.
> > > A driver is attached to a single device.
> > 
> > And that device is the owner no? to send commands?
> >
> Not for legacy registers access as discussed before.

Now I am lost. legacy register access is on top of aq which
sends commands through pf.  how are you going to send
commands without attaching a driver, I don't know.

> > > If (legacy_offset == queue_notify_offset)
> > >*db = guest_supplied_q_notify_content; else
> > > virtio_net_send_aq_cmd();
> > >
> > > "simple" is really a subjective term in this context.
> > 
> > yes this is qemu. sure.
> >
> Not limited to QEMU.
> A driver will be able to do this as well.

No, on some OS-es (windows, iokit, more) a single driver can't bind to
many devices without a lot of pain.  If you are mapping VF BAR, this has
to be done by VF driver, and please just avoid pain for everyone by
exposing the necessary info through the VF itself.
A capability seems perfect for this.

An alternative is not to map VF BAR at all, steal space from PF BAR instead.
Guest is not accessing this anyway. So I really do not see why not
from software point of view. There's a hardware reason? Could you
talk to hardware guys about this? You objected to AQ too then
hardware guys told you it is not a big deal.


> > So we have legacy emulation send commands to VF or to PF.  Okay. But let us
> > avoid the need for VF driver to send commands to PF to initialize.
> > Just get all information it needs from VF itself.
> > 
> > 
> > Maybe it's a good idea to reuse existing notification capability, or maybe 
> > a new
> > one, but let's avoid making VF driver depend on PF commands.
> >
> We agreed in v1 on Jason's suggestion to have the AQ command and yet
> you and Jason hinder this in v2 with this exact repeated question.
> Lets please avoid this and move forward.

How was I supposed to raise this issue on v1 if it was not there?
Stop arguing in circles and things will move forward.

If we have trouble converging on the notification thing, how about we
make progress without it for now?  Merge a slow version that sends kicks
through aq, then work on the best way to make notifications faster,
separately.


> > > We have already these design choices and tradeoff in v0 and v1, it 
> > > doesn't fit
> > the requirements.
> > 
> 
> > So, I am saying one model is small driver for VF and a big one for PF.
> > And to keep the VF driver simple, it should get information simply from 
> > config
> > space capability.
> 
> VF driver is small that does usual vfio passthrough work.
> PF driver implement AQ for variety of use cases that we listed in the AQ 
> cover letter.
> VF driver implements 5 AQ commands that you suggested to split from 2 to 4.

VF(member) driver can't implement AQ commands at all. They are sent to
PF(owner) thus by PF(owner) driver.  In virt configs, VMM can trap
legacy access and talk to owner driver to send them.  It can also talk
to member driver to send notifications.  You can I guess have VMM get
member BAR offsets from owner and pass them to member for mapping. This
raises all kind of questions of trust.
If you can map BAR from PF that would be simplest, VMM then only pokes
at PF and not VF. If not then we really should expose this info
n the VF if at all possible.





-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v2 1/2] transport-pci: Introduce legacy registers access commands

2023-05-22 Thread Michael S. Tsirkin
On Mon, May 22, 2023 at 08:54:44AM +0800, Jason Wang wrote:
> On Sun, May 21, 2023 at 10:33 PM Michael S. Tsirkin  wrote:
> >
> > On Sun, May 21, 2023 at 01:21:17PM +, Parav Pandit wrote:
> > >
> > > > From: virtio-comm...@lists.oasis-open.org  > > > open.org> On Behalf Of Michael S. Tsirkin
> > > > Sent: Sunday, May 21, 2023 5:17 AM
> > >
> > > > > The Notification of the VFs are on the VF BAR for modern or legacy.
> > > > > One needs to build additional cross forwarding hardware from PF to VF 
> > > > > for the
> > > > doorbells.
> > > >
> > > > I think doorbells are driver notifications (linux driver calls them 
> > > > kicks)?
> > > > I don't understand what you are saying above really.
> > > > what can and what can't be done?
> > > >
> > > VF has the notification BAR region.
> > > All 1.x and legacy such notification lands on the VF BAR.
> > >
> > > > Again all this idea (as opposed to Jason's transport vq) is to have a 
> > > > simple
> > > > software model. Attaching a driver to two devices at the same time is 
> > > > hard to
> > > > achive e.g. under windows.
> > > >
> > > Yet you initiate same discussion point that we already discussed again 
> > > after summarizing.
> > > A driver is not attached to two devices.
> > > A driver is attached to a single device.
> >
> > And that device is the owner no? to send commands?
> >
> > > A device uses a parent/owner group device to access legacy.
> > >
> > > Software model may evolve over time.
> > >
> > > >
> > > > > And it cannot utilize what already exists for 1.x VF.
> > > > >
> > > > >  > 2. It should be possible to send notifications through an admin
> > > > > command too,
> > > > > >otherwise admin commands are an incomplete set of functionality.
> > > > > >
> > > > > Yes. it is only for the functionality. As we discussed in past 
> > > > > already, this will not
> > > > get any performance.
> > > >
> > > > Performance might be ok if hardware disables kicks most of the time.
> > > >
> > > Even for the first kick it is order of magnitude higher.
> > > Kicks is the natural tool for the low latency.
> > >
> > > >
> > > > > > 3. I feel using a capability to describe legacy notification
> > > > > >area would be better just because we already have a
> > > > > >structure for this. make it an express capability if you like.
> > > > > The AQ command interface is far more programable object than PCI 
> > > > > capability
> > > > to return this admin info.
> > > > > Hence I prefer AQ cmd.
> > > >
> > > > I feel your preferences for 1 and 3 conflict. If you really insist on 
> > > > kicks being on
> > > > VFs then at least let us make VF driver in the host simple.
> > > It is straight forward.
> > >
> > > If you prefer the "offset" example of past,
> > >
> > > If (legacy_offset == queue_notify_offset)
> > >*db = guest_supplied_q_notify_content;
> > > else
> > > virtio_net_send_aq_cmd();
> > >
> > > "simple" is really a subjective term in this context.
> >
> > yes this is qemu. sure.
> >
> > So we have legacy emulation send commands to VF or to PF.  Okay. But let
> > us avoid the need for VF driver to send commands to PF to initialize.
> > Just get all information it needs from VF itself.
> >
> >
> > Maybe it's a good idea to reuse existing notification capability,
> > or maybe a new one, but let's avoid making VF driver depend on PF
> > commands.
> >
> >
> >
> > > > If it has to talk to PF
> > > > driver things are really complex. At this point we are very very far 
> > > > from VFIO
> > > > model, and then what exactly have we gained by implementing legacy 
> > > > control
> > > > path in hardware?
> > > It is in the VFIO model, legacy largely falls in the special path for 
> > > backward compat for the hw devices.
> > >
> > > > Let's do software with maybe a couple of features such as
> > > > VIRTIO_NET_F_LEGACY_HEADER.
> > > >
> > > We have already these design choices and tradeoff in v0 and v1, it 
> > > doesn't fit the requirements.
> >
> > BTW not for this project but generally this is why I thought it is not
> > at all a bad idea to have a requirements text document e.g. under
> > todo/
> > discuss it as normally maybe even vote on including features in a
> > plan for a release.
> >
> > > We cannot iterate exact 100% same points again in this summary discussion.
> > >
> > > If you have any different comments from v0 and v1, please share, 
> > > otherwise these commands should proceed.
> > >
> > > What you hint is saying hey, "transport everything through the PF, then 
> > > why do you have a VF?"
> > > We again past that discussion.
> > >
> > > This is different requirement, than
> > > "There is a VF passthrough accessible directly from the guest without a 
> > > VI, some of them need to have legacy support also".
> > > Hence how do one make that legacy guest utilize it the VF in passthrough 
> > > manner for 1.x and 0.9.5.
> > > With this proposal 0.9.5 slow registers are accessed over the owner group 
> > > admin PF keeping the 

[virtio-dev] Re: [PATCH v14] virtio-net: support inner header hash

2023-05-22 Thread Michael S. Tsirkin
On Mon, May 22, 2023 at 01:02:36PM +0800, Heng Qi wrote:
> 1. Currently, a received encapsulated packet has an outer and an inner 
> header, but
> the virtio device is unable to calculate the hash for the inner header. The 
> same
> flow can traverse through different tunnels, resulting in the encapsulated
> packets being spread across multiple receive queues (refer to the figure 
> below).
> However, in certain scenarios, we may need to direct these encapsulated 
> packets of
> the same flow to a single receive queue. This facilitates the processing
> of the flow by the same CPU to improve performance (warm caches, less 
> locking, etc.).
> 
>client1client2
>   |+---+ |
>   +--->|tunnels|<+
>+---+
>   |  |
>   v  v
>   +-+
>   | monitoring host |
>   +-+
> 
> To achieve this, the device can calculate a symmetric hash based on the inner 
> headers
> of the same flow.
> 
> 2. For legacy systems, they may lack entropy fields which modern protocols 
> have in
> the outer header, resulting in multiple flows with the same outer header but
> different inner headers being directed to the same receive queue. This 
> results in
> poor receive performance.
> 
> To address this limitation, inner header hash can be used to enable the 
> device to advertise
> the capability to calculate the hash for the inner packet, regaining better 
> receive performance.
> 
> Signed-off-by: Heng Qi 
> Reviewed-by: Xuan Zhuo 
> ---
> v13->v14:
>   1. Move supported_hash_tunnel_types from config space into cvq command. 
> @Parav Pandit
>   2. Rebase to master branch.
>   3. Some minor modifications.

So, I proposed adding a "generic UDP tunnel" option which simply uses UDP source
port for hash. I think it will help us not having to chaise future tunnels as
more and more are added.
I also suggested dropping some tunnels which are less common and where
the specification is unambiguous enough that source port should include
inner hash.

Did you decide not to include this idea then?


> v12->v13:
>   1. Add a GET command for hash_tunnel_types. @Parav Pandit
>   2. Add tunneling protocol explanation. @Jason Wang
>   3. Add comments on some usage scenarios for inner hash.
> 
> v11->v12:
>   1. Add a command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG.
>   2. Refine the commit log. @Michael S . Tsirkin
>   3. Add some tunnel types.
> 
> v10->v11:
>   1. Revise commit log for clarity for readers.
>   2. Some modifications to avoid undefined terms. @Parav Pandit
>   3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
>   4. Add the normative statements. @Parav Pandit
> 
> v9->v10:
>   1. Removed hash_report_tunnel related information. @Parav Pandit
>   2. Re-describe the limitations of QoS for tunneling.
>   3. Some clarification.
> 
> v8->v9:
>   1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
>   2. Add tunnel security section. @Michael S . Tsirkin
>   3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
>   4. Fix some typos.
>   5. Add more tunnel types. @Michael S . Tsirkin
> 
> v7->v8:
>   1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
>   2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
>   3. Removed re-definition for inner packet hashing. @Parav Pandit
>   4. Fix some typos. @Michael S . Tsirkin
>   5. Clarify some sentences. @Michael S . Tsirkin
> 
> v6->v7:
>   1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
>   2. Fix some syntax issues. @Michael S. Tsirkin
> 
> v5->v6:
>   1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
>   2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
>   3. Move the links to introduction section. @Michael S. Tsirkin
>   4. Clarify some sentences. @Michael S. Tsirkin
> 
> v4->v5:
>   1. Clarify some paragraphs. @Cornelia Huck
>   2. Fix the u8 type. @Cornelia Huck
> 
> v3->v4:
>   1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to 
> VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
>   2. Make things clearer. @Jason Wang @Michael S. Tsirkin
>   3. Keep the possibility to use inner hash for automatic receive 
> steering. @Jason Wang
>   4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. 
> many times. @Michael S. Tsirkin
> 
> v2->v3:
>   1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
>   2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason 
> Wang, @Michael S. Tsirkin
> 
> v1->v2:
>   1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
>   2. Clarify some paragraphs. @Jason Wang
>   3. Add \field{hash_tunnel} and 

[virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-22 Thread Michael S. Tsirkin
On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> 1) Add a feature bit to the virtio specification to tell the sender that a 
> fully
> csumed packet must be sent.

Who is the sender in this picture? The driver?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-22 Thread Heng Qi


Currently, the VIRTIO_NET_F_GUEST_CSUM(NETIF_F_RXCSUM) feature of the virtio-net
driver conflicts with the loading of the XDP program, which is caused by the
problem described in [1][2], that is, XDP may cause errors in partial 
csumed-related
fields and resulting in packet dropping. rx CHECKSUM_PARTIAL mainly exists in 
the
virtualized environment, and its purpose is to save computing resource overhead.

The *goal* of this proposal is to enable the coexistence of XDP and 
VIRTIO_NET_F_GUEST_CSUM.

1. We need to understand why the device driver receives the rx CHECKSUM_PARTIAL 
packet.

Drivers related to the virtualized environment, such as 
virtio-net/veth/loopback,
etc., may receive partial csumed packets.

When the tx device finds that the destination rx device of the packet is
located on the same host, it is clear that the packet may not pass through
the physical link, so the tx device sends the packet with csum_{start, offset}
directly to the rx side to save computational resources without computing a 
fully csum
(depends on the specific implementation, some virtio-net backend devices are 
known to
behave like this currently). From [3], the stack trusts such packets.

However, veth still has NETIF_F_RXCSUM turned on when loading XDP. This may 
cause
packet dropping as [1][2] stated. But currently the veth community does not 
seem to
have reported such problems, can we guess that the coexistence of XDP and
rx CHECKSUM_PARTIAL has less negative impact?

2. About rx CHECKSUM_UNECESSARY:

We have just seen that in a virtualized environment a packet may flow between 
the
same host, so not computing the complete csum for the packet saves some cpu 
resources.

The purpose of the checksum is to verify that packets passing through the
physical link are correct. Of course, it is also reasonable to do a fully csum 
for
packets of the virtualized environment, which is exactly what we need.

rx CHECKSUM_UNECESSARY indicates that the packet has been fully checked,
that is, it is a credible packet. If such a packet is modified by the XDP 
program,
the user should recalculate the correct checksum using bpf_csum_diff() and
bpf_{l3,l4}_csum_replace().

Therefore, for those drivers(physical nic drivers?), such as atlantic/bnxt/mlx,
etc., XDP and NETIF_F_RXCSUM coexist, because their packets will be fully 
checked
at the tx side.

AWS's ena driver is also designed to be in this fully checksum mode
(we also mentioned below that a feature bit can be provided for virtio-net,
telling the sender that a fully checksum must be calculated to implement similar
behavior to other drivers), although it is in a virtualized environment.

3. To sum up:

It seems that only virtio-net sets XDP and VIRTIO_NET_F_GUEST_CSUM as mutually
exclusive, which may cause the following problems:

When XDP loads,

1) For packets that are fully checked by the sender, packets are marked as 
CHECKSUM_UNECESSARY
by the rx csum hw offloading.

virtio-net driver needs additional CPU resources to compute the checksum for 
any packet.

When testing with the following command in Aliyun ECS:
qperf dst_ip -lp 8989 -m 64K -t 20 tcp_bw
(mtu = 1500, dev layer GRO is on)

The csum-related overhead we tested on X86 is 11.7%, and on ARM is 15.8%.

2)
One of the main functions of the XDP prog is to be used as a monitoring and
firewall, etc., which means that the XDP prog may not modify the packet.
This is applicable to both rx CHECKSUM_PARTIAL and rx CHECKSUM_UNECESSARY,
but we ignore the rx csum hw offloading capability for both cases.

4. How we try to solve:

1) Add a feature bit to the virtio specification to tell the sender that a fully
csumed packet must be sent. Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM 
when this
feature bit is negotiated. (similar to ENA behavior)

2) Modify the current virtio-net driver

No longer filter the VIRTIO_NET_F_GUEST_CSUM feature in virtnet_xdp_set().
Then we can immediately get the ability from VIRTIO_NET_F_GUEST_CSUM and enjoy 
the software
CPU resources saved by rx csum hw offloading.
(This method is a bit rude)

5. Ending 

This is a proposal and does not represent a formal solution. Looking forward to 
feedback
from the community and exploring a possible/common solution to the problem 
described in
this proposal.

6. Quote

[1] 18ba58e1c234

virtio-net: fail XDP set if guest csum is negotiated

We don't support partial csumed packet since its metadata will be lost
or incorrect during XDP processing. So fail the XDP set if guest_csum
feature is negotiated.

[2] e59ff2c49ae1

virtio-net: disable guest csum during XDP set

We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
can receive partial csumed packets with metadata kept in the
vnet_hdr. This may have several side effects:

- It could be overridden by header adjustment, thus is might be not
  correct after XDP processing.
- There's no way to pass such metadata information through