[virtio-dev] Re: [virtio-comment] [PATCH v10] virtio-net: support the virtqueue coalescing moderation

2023-03-08 Thread Heng Qi




在 2023/3/7 上午6:57, Michael S. Tsirkin 写道:

On Thu, Mar 02, 2023 at 11:36:18AM +, David Edmondson wrote:

+for an enabled transmit/receive 
virtqueue whose number is \field{vqn}.

Should this now be "whose index is \field{vqn}"?

Ugh.  I guess we'll have to fix the number/index mess in the spec
first. Parav you said you are looking into it?




# Where virtqueue number and virtqueue index are used.
  1. In the Virtqueue Configuration Section, use the virtqueue index: 
"Write the **virtqueue index** (first queue is 0) to queue_select."

  2. Both descriptions are used separately in the Notification Section.
  2.1 Here vqn is called virtqueue index:
    "When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, 
the driver sends an available buffer notification
  to the device by writing the **16-bit virtqueue index** 
of this virtqueue to the Queue Notify address.

  ...

  le32 {
 vqn: 16;
 next_off : 15;
 next_wrap: 1;
 };
 ...

 If VIRTIO_F_NOTIFICATION_DATA has not been negotiated, the 
driver MUST use the queue_notify_data value instead of the **virtqueue 
index**."


  2.2 Here vqn is called virtqueue number:
    "When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, 
the Notification data contains the **Virtqueue number**.

    ...

    be32 {
    vqn: 16;
    next_off : 15;
    next_wrap: 1;
 };
    ...

    When VIRTIO_F_NOTIFICATION_DATA has not been negotiated, 
this notification involves sending the **virtqueue number** to the 
device (method depending on the transport).

    ...

    vqn -- **VQ number** to be notified."

# 0-based index and 0-based number are used respectively in the RSS Section:
1. "Field unclassified_queue contains the **0-based index** of the 
receive virtqueue to place unclassified packets in. Index 0 corresponds 
to receiveq1."
2. "use the result as the index in the indirection table to get 
**0-based number** of destination receiveq (value of 0 corresponds to 
receiveq1)."




\field{vqn} has been called '0-based virtqueue index' or '0-based 
virtqueue number',

I think both seem to be friendly to readers, so what are your options?

Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] RE: [PATCH v10] virtio-net: support the virtqueue coalescing moderation

2023-03-08 Thread Heng Qi




在 2023/3/9 上午6:30, Parav Pandit 写道:

From: virtio-dev@lists.oasis-open.org  On
Behalf Of Heng Qi
Sent: Thursday, March 2, 2023 10:27 PM


I remember we discussed that instead of mentioning each individual field,

better to describe the whole structure being read-only or write-only.

Consider the following scenarios:
1. A read-only field of the structure virtio_net_ctrl_coal is extended for
CTRL_NOTF_COAL_RX/TX_SET to get some extra information

A set command cannot extend the struct virtio_net_ctrl_coal, particularly for 
read-only and partially for write-only.
This would mean that for the tiny number of bytes, an additional dma descriptor 
is to be allocated with read/write-only permissions.
It would be inefficient for the driver to do so for the SET command to have vqn 
as write-only, reserved as read-only, rest fields as write-only dma attributes.

As I think more, I think the whole set command fields should be read-only for 
device. Better to describe it this way instead of splitting individual fields.
This way driver can just do a single DMA allocation with read-only attributes 
for set cmd.

Get command doesn’t have any choice today the way CVQ is structured to it lives 
with the limitation.


I think it is reasonable and will be revised in the next version.




Looks good, however you have well covered in the device normative

statements.

So possibly it can be removed from here.

I tend to keep this, as we have done in the past, we can have normative
descriptions and the corresponding non-normative descriptions.


Ok. but please revisit if the description can be simpler text than the 
normative lines.


Ok.

Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-08 Thread Heng Qi




在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:


在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

 it is necessary for entropy from encapsulated packets to be
 exposed in the tunnel header.  The most common technique for this is
 to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?


Assuming that the same flow includes a unidirectional flow a->b, or a 
bidirectional flow a->b and b->a,

such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another 
gateway for some reason, resulting in different outer IP address,
    then this flow may be processed by different CPUs after reaching 
the host if there is no inner hash. So after the host receives the
    flow, first use the forwarding CPUs to parse the inner hash, and 
then use the hash to ensure that the flow is processed by the

    same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow 
may go to gateway 2. In order to ensure that the same flow is
    processed by the same CPU, we still need the forwarding CPUs to 
parse the real inner hash(here, the hash key needs to be replaced with a 
symmetric hash key).





1). During this process, the CPUs on the host is divided into two parts, one
part is used as a forwarding node to parse the outer header,
  and the CPU utilization is low. Another part handles packets.

Some overhead is clearly involved in *sending* packets -
to calculate the hash and stick it in the port number.
This is, however, a separate problem and if you want to
solve it then my suggestion would be to teach the *transmit*
side about GRE offloads, so it can fill the source port in the card.


2). The entropy of the source udp src port is not enough, that is, the queue
is not widely distributed.

how isn't it enough? 16 bit is enough to cover all vqs ...


A 5-tuple brings more entropy than a single port, doesn't it? In fact, 
the inner hash of the physical network card used by
the business team is indeed better than the udp port number of the outer 
header we modify now, but they did not give me the data.



2. When there is an inner header hash, the gateway will directly help parse
the outer header, and use the inner 5 tuples to calculate the inner hash.
The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there is
no need to use some CPUs to forward and parse the outer header.

You really have to parse the outer header anyway,
otherwise there's no tunneling.
Unless you want to teach virtio to implement tunneling
in hardware, which is something I'd find it easier to
get behind.


There is no need to parse the outer header twice, because we use shared 
memory.



2) The entropy of the original quintuple is sufficient, and the queue is
widely distributed.

It's exactly the same entropy, why would it be better? In fact you
are taking out the outer hash entropy making things worse.


I don't get the point, why the entropy of the inner 5-tuple and the 
outer tunnel header is the same,

multiple streams have the same outer header.

Thanks.



Thanks.

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-09 Thread Heng Qi




在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?
For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?



Inner hash can at least hash tunnel flows without outer transport 
headers like GRE to multiple queues,

which is beneficial to us.

For tunnel flows with outer transport headers like VXLAN, although they 
can hash flows to different queues
by setting different outer udp port, this does not conflict with inner 
hash. Inner hashing can also be used for this purpose.


For the same flow, packets in the receiving and sending directions may 
pass through different tunnels respectively, which cause
the same flow to be hashed to different queues. In this case, we have to 
calculate a symmetric hash (can be called an inner symmetric hash, which 
is a type of inner hash.)
through the inner header, so that the same flow can be hashed to the 
same queue.


Symmetric hashing can ignore the order of the 5-tuples to calculate the 
hash, that is, the hash values ​​calculated by (a1, a2, a3, a4) and (a2, 
a1, a4, a3) respectively are the same.


Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-10 Thread Heng Qi





在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:


在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

  it is necessary for entropy from encapsulated packets to be
  exposed in the tunnel header.  The most common technique for this is
  to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?



Let's simplify some details and take a fresh look at two different 
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).


1. In Scenario1, we can improve the processing performance of the same 
flow by implementing inner symmetric hashing.


This is because even though client1 and client2 communicate 
bidirectionally through the same flow, their data may pass


through and be encapsulated by different tunnels, resulting in the same 
flow being hashed to different queues and processed by different CPUs.


To ensure consistency and optimized processing, we need to parse out the 
inner header and compute a symmetric hash on it using a special rss key.


Sorry for not mentioning the inner symmetric hash before, in order to 
prevent the introduction of more concepts, but it is indeed a kind of 
inner hash.


2. In Scenario2 with GRE, the lack of outer transport headers means that 
flows between multiple communication pairs encapsulated by the same tunnel


will all be hashed to the same queue. To address this, we need to 
implement inner hashing to improve the performance of RSS. By parsing 
and calculating


the inner hash, different flows can be hashed to different queues.

Thanks.




Assuming that the same flow includes a unidirectional flow a->b, or a
bidirectional flow a->b and b->a,
such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another gateway
for some reason, resulting in different outer IP address,
     then this flow may be processed by different CPUs after reaching the
host if there is no inner hash. So after the host receives the
     flow, first use the forwarding CPUs to parse the inner hash, and then
use the hash to ensure that the flow is processed by the
     same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
go to gateway 2. In order to ensure that the same flow is
     processed by the same CPU, we still need the forwarding CPUs to parse
the real inner hash(here, the hash key needs to be replaced with a symmetric
hash key).

Oh intersting. What are those gateways, how come there's expectation
that you can change their addresses and topology
completely seamlessly without any reordering whatsoever?
Isn't network topology change kind of guaranteed to change ordering
sometimes?



1). During this process, the CPUs on the host is divided into two parts, one
part is used as a forwarding node to parse the outer header,
   and the CPU utilization is low. Another part handles packets.

Some overhead is clearly involved in *sending* packets -
to calculate the hash and stick it in the port number.
This is, however, a separate problem and if you want to
solve it then my suggestion would be to teach the *transmit*
side about GRE offloads, so it can fill the source port in the card.


2). The entropy of the source udp src port is not enough, that is, the queue
is not widely distributed.

how isn't it enough? 16 bit is enough to cover all vqs ...

A 5-tuple brings more entropy than a single port, doesn't it?

But you don't need more for RSS, the indirection table is not
that large.


In fact, the
inner hash of the physical network card used by
the business team is indeed better than the udp port number of the outer
header we modify now, but they did not give m

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-15 Thread Heng Qi




在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:

On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:



在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:

在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

   it is necessary for entropy from encapsulated packets to be
   exposed in the tunnel header.  The most common technique for this is
   to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?


Let's simplify some details and take a fresh look at two different
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).

1. In Scenario1, we can improve the processing performance of the same flow
by implementing inner symmetric hashing.

This is because even though client1 and client2 communicate bidirectionally
through the same flow, their data may pass

through and be encapsulated by different tunnels, resulting in the same flow
being hashed to different queues and processed by different CPUs.

To ensure consistency and optimized processing, we need to parse out the
inner header and compute a symmetric hash on it using a special rss key.

Sorry for not mentioning the inner symmetric hash before, in order to
prevent the introduction of more concepts, but it is indeed a kind of inner
hash.

If parts of a flow go through different tunnels won't this cause
reordering at the network level? Why is it so important to prevent it at
the nic then?  Or, since you are stressing symmetric hash, are you
talking about TX and RX side going through different tunnels?


Yes, the directions client1->client2 and client2->client1 may go through 
different tunnels.
Using inner symmetric hashing can satisfy the same CPU to process two 
directions of the same flow to improve performance.






2. In Scenario2 with GRE, the lack of outer transport headers means that
flows between multiple communication pairs encapsulated by the same tunnel

will all be hashed to the same queue. To address this, we need to implement
inner hashing to improve the performance of RSS. By parsing and calculating

the inner hash, different flows can be hashed to different queues.

Thanks.



Well 2 is at least inexact, there's flowID there. It's just 8 bit


We use the most basic GRE header fields (not NVGRE), not even optional 
fields.
There is also no flow id in the GRE header, should you be referring to 
NVGRE?


Thanks.


so not sufficient if there are more than 512 queues. Still 512 queues
is quite a lot. Are you trying to solve for configurations with
more than 512 queues then?



Assuming that the same flow includes a unidirectional flow a->b, or a
bidirectional flow a->b and b->a,
such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another gateway
for some reason, resulting in different outer IP address,
      then this flow may be processed by different CPUs after reaching the
host if there is no inner hash. So after the host receives the
      flow, first use the forwarding CPUs to parse the inner hash, and then
use the hash to ensure that the flow is processed by the
      same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
go to gateway 2. In order to ensure that the same flow is
      processed by the same CPU, we still need the forwarding CPUs to parse
the real inner hash(here, the hash key needs to be replaced with a symmetric
hash key).

Oh intersting. What are those gateways, how come there's expectation
that you can change their addresses and topology
completely seamlessly without any reordering whatsoever?
Isn't network topology change kind of guaranteed to change ordering
sometimes?



[virtio-dev] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-15 Thread Heng Qi




在 2023/3/15 上午11:23, Parav Pandit 写道:



On 3/6/2023 10:48 AM, Heng Qi wrote:


  +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash

May be to say inner packet header hash..
This make its little more clear about "which header" that you 
explained in the commit log.




Sure, I'll add this.


+    for tunnel-encapsulated packets.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications 
coalescing.
    \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 
packets.
@@ -139,6 +142,7 @@ \subsubsection{Feature bit 
requirements}\label{sec:Device Types / Network Device

  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.

  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
I think this should also say that HASH_TUNNEL requires either of the 
F_RSS or F_HASH_REPORT.

Because without it HASH_TUNNEL is not useful.


F_HASH_TUNNEL indicates that the hash should be calculated using the 
inner packet header, even without F_RSS or F_HASH_REPORT,
we can continue to use the hash value in scenarios such as RPS or ebpf 
programs.



Right?
if no, than my below comments are meaningless.


I think it's fine to let F_HASH_TUNNEL rely on F_RSS or _F_HASH_REPORT 
as those are probably important scenarios where inner packet header hash 
is used.






  \end{description}
    \subsubsection{Legacy Interface: Feature bits}\label{sec:Device 
Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -198,20 +202,27 @@ \subsection{Device configuration 
layout}\label{sec:Device Types / Network Device

  u8 rss_max_key_size;
  le16 rss_max_indirection_table_length;
  le32 supported_hash_types;
+    le32 supported_tunnel_hash_types;
  };
  \end{lstlisting}
-The following field, \field{rss_max_key_size} only exists if 
VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
+The following field, \field{rss_max_key_size} only exists if 
VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or 
VIRTIO_NET_F_HASH_TUNNEL is set.

  It specifies the maximum supported length of RSS key in bytes.

I think rss_max_key_size field dependency should be only of the 
existing feature bits F_RSS and F_HASH_REPORT.
This is because those are the bits really deciding to consider 
rss_max_key_size.


  The following field, \field{rss_max_indirection_table_length} only 
exists if VIRTIO_NET_F_RSS is set.
  It specifies the maximum number of 16-bit entries in RSS 
indirection table.
    The next field, \field{supported_hash_types} only exists if the 
device supports hash calculation,

-i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
+i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or 
VIRTIO_NET_F_HASH_TUNNEL is set.



Same as above.

  Field \field{supported_hash_types} contains the bitmask of 
supported hash types.
  See \ref{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash calculation for incoming 
packets / Supported/enabled hash types} for details of supported hash 
types.
  +The next field, \field{supported_tunnel_hash_types} only exists if 
the device
+supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is 
set.

+
Above line "the next field .." can be just same as "Device supports 
inner packet header hash calculation, i.e..."
This is because here, the term "header" is missed, which is present in 
the definition of feature bit 52.


I'll rephrase the description to make the whole text more consistent.



    The device MUST set \field{rss_max_key_size} to at least 40, if 
it offers

-VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
+VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL.
This needs change if the above first comment about rss_max_key_size is 
right.


  The device MUST set \field{rss_max_indirection_table_length} to at 
least 128, if it offers

  VIRTIO_NET_F_RSS.
@@ -843,11 +854,13 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network

  \begin{itemize}
  \item The feature VIRTIO_NET_F_RSS was negotiated. The device uses 
the hash to determine the receive virtqueue to place incoming packets.
  \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The 
device reports the hash value and the hash type with the packet.
+\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The 
device supports inner hash calculation.

  \end{itemize}


inner packet header hash ..


Ok.



+If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated, the 
encapsulation
+hash type below indicates that the hash is calculated over the inner 
header of

+the encapsulated packet:
+Hash type applicable for inner payload of the gre-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE (1 << 1)
+\end{lst

[virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-15 Thread Heng Qi




在 2023/3/15 下午8:10, Michael S. Tsirkin 写道:

On Tue, Mar 14, 2023 at 11:23:55PM -0400, Parav Pandit wrote:

If not, for now it may be better to skip vxlan and nvegre as they inherently
have unique outer header UDP src port based on the inner header.

So what's left, GRE?  GRE is actually different, in that it's not IP at
all.


I do not think so, I mentioned that VXLAN and GENEVE need inner 
symmetric hashing, and we need this.


And we know inner hashing doesn't conflict with other ways of adding 
entropy.


Thanks.


So if we are talking about GRE, hash is indeed not calculated at all at
the moment, right?  And I would say a natural first step for GRE is
actually adding a hash type that will support this protocol.

How about doing that? It seems like this should be a small step
and completely uncontroversial.





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi
On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:
> On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > 
> > 
> > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > > 
> > > > 
> > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash 
> > > > > > > > > > calculated
> > > > > > > > > > using the outer header of the receive packets is always 
> > > > > > > > > > fixed for the
> > > > > > > > > > same flow packets, i.e. they will be steered to the same 
> > > > > > > > > > receive queue.
> > > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > > Yes, you are right. That's what we did before the inner header 
> > > > > > > > hash, but it
> > > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > > 
> > > > > > > > > For example geneve spec says:
> > > > > > > > > 
> > > > > > > > >it is necessary for entropy from encapsulated packets 
> > > > > > > > > to be
> > > > > > > > >exposed in the tunnel header.  The most common 
> > > > > > > > > technique for this is
> > > > > > > > >to use the UDP source port
> > > > > > > > The end point of the tunnel called the gateway (with DPDK on 
> > > > > > > > top of it).
> > > > > > > > 
> > > > > > > > 1. When there is no inner header hash, entropy can be inserted 
> > > > > > > > into the udp
> > > > > > > > src port of the outer header of the tunnel,
> > > > > > > > and then the tunnel packet is handed over to the host. The host 
> > > > > > > > needs to
> > > > > > > > take out a part of the CPUs to parse the outer headers (but not 
> > > > > > > > drop them)
> > > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > > and then use the inner
> > > > > > > > hash to forward them to another part of the CPUs that are 
> > > > > > > > responsible for
> > > > > > > > processing.
> > > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > > tunnel, why is your host doing this?
> > > > 
> > > > Let's simplify some details and take a fresh look at two different
> > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > > 
> > > > 1. In Scenario1, we can improve the processing performance of the same 
> > > > flow
> > > > by implementing inner symmetric hashing.
> > > > 
> > > > This is because even though client1 and client2 communicate 
> > > > bidirectionally
> > > > through the same flow, their data may pass
> > > > 
> > > > through and be encapsulated by different tunnels, resulting in the same 
> > > > flow
> > > > being hashed to different queues and processed by different CPUs.
> > > > 
> > > > To ensure consistency and optimized processing, we need to parse out the
> > > > inner header and compute a symmetric hash on it using a special rss key.
> > > > 
> > > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > > prevent the introduction of more concepts, but it is indeed a kind of 
> > > > inner
> > > > hash.
> > > If parts of a flow go through 

[virtio-dev] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi




在 2023/3/15 下午11:09, Michael S. Tsirkin 写道:

On Wed, Mar 15, 2023 at 09:19:43PM +0800, Heng Qi wrote:

Any encapsulation technology that includes UDP/L4 header likely do not
prefer based on the inner header. This is because the outer header src
port entropy is added based on the inner header.

I was not able to follow the discussion in v9 that you had with Michael.
Did you conclude if this is needed for vxlan too?

If not, for now it may be better to skip vxlan and nvegre as they
inherently have unique outer header UDP src port based on the inner
header.

Symmetric hashing ignores the order of the five-tuples when calculating the
hash, that is, using (a1,a2,p1,p2) and (a2,a1,p2,p1) respectively can
calculate the same hash.
There is a scenario that the two directions client1->client2 and
client2->client1 of the same flow may pass through different tunnels.
In order to allow the data in two directions to be processed by the same
CPU, we need to calculate a symmetric hash based on the inner packet header.
Sorry I didn't mention this earlier just to avoid introducing the concept of
symmetric hashing.

But the hash is already there in the port. Is it then maybe just
the question of ignoring the IP addresses when hashing?


We do not ignore the IP address, because after the tunnel sends the 
packets to the processing host,
the processing host will parse the outer headers, and then use the inner 
symmetric hash to hand over the
packets of the same flow to the same cpu for processing (for the network 
topology, please check my latest reply thread).


Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi
On Wed, Mar 15, 2023 at 07:06:53PM -0400, Parav Pandit wrote:
> 
> 
> On 3/15/2023 9:19 AM, Heng Qi wrote:
> >
> >
> >在 2023/3/15 上午11:23, Parav Pandit 写道:
> >>
> >>
> >>On 3/6/2023 10:48 AM, Heng Qi wrote:
> >>
> [..]
> >>>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> >>>+\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
> >>I think this should also say that HASH_TUNNEL requires either of
> >>the F_RSS or F_HASH_REPORT.
> >>Because without it HASH_TUNNEL is not useful.
> >
> >F_HASH_TUNNEL indicates that the hash should be calculated using
> >the inner packet header, even without F_RSS or F_HASH_REPORT,
> >we can continue to use the hash value in scenarios such as RPS or
> >ebpf programs.
> Yes.
> Even for rps or ebpf programs, F_HASH_TUNNEL is fine.
> When such feature arrives in future, it above line will have OR
> condition for RPS feature bit.
> 
> >
> >
> >I think it's fine to let F_HASH_TUNNEL rely on F_RSS or
> >_F_HASH_REPORT as those are probably important scenarios where
> >inner packet header hash is used.
> Yes.
> 
> >>If not, for now it may be better to skip vxlan and nvegre as
> >>they inherently have unique outer header UDP src port based on
> >>the inner header.
> >
> >Symmetric hashing ignores the order of the five-tuples when
> >calculating the hash, that is, using (a1,a2,p1,p2) and
> >(a2,a1,p2,p1) respectively can calculate the same hash.
> >There is a scenario that the two directions client1->client2 and
> >client2->client1 of the same flow may pass through different
> >tunnels.
> >In order to allow the data in two directions to be processed by
> >the same CPU, we need to calculate a symmetric hash based on the
> >inner packet header.
> I am lost in two directions and two clients above.
> When you say two directions, do you mean tx and rx?
> and do symmetric hashing between tx and rx between two end points
> within single tunnel?
>

A rough description of this scene is as follows:
Client1 sends packets to client2, and client2 sends packets to client1 
respectively.
This is called two directions, and they are in the same flow.
The packets in the two directions may reach the processing host through
different tunnels. In this scenario, we need to hash these packets to the
same queue for the same cpu to process, so inner symmetric hashing is required.


client1   client2
| |
|  __ |
+->| tunnels |<---+
   |-|
  |  |
  |  |
  |  |
  v  v
+-+
| processing host |
+-+


> >>>+\field{hash_tunnel_types} is set to
> >>>VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for the
> >>>+unencapsulated packets.
> >>>+
> I missed this before. unencapsulated is not a term.
> s/unencapsulated packets/Non encapsulated packets or non tunneled packets
> 

Yes, you are right!

Thanks.

> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscr...@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
> List help: virtio-comment-h...@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v10] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi




在 2023/3/16 上午7:24, Parav Pandit 写道:


On 3/15/2023 8:10 AM, Michael S. Tsirkin wrote:

On Tue, Mar 14, 2023 at 11:23:55PM -0400, Parav Pandit wrote:
If not, for now it may be better to skip vxlan and nvegre as they 
inherently

have unique outer header UDP src port based on the inner header.


So what's left, GRE?  GRE is actually different, in that it's not IP at
all.

Sorry, I wrongly wrote nvegre above.

IPoIP, GRE and NVGRE are left.

vxlan and geneve has the udp src entropy.




Not sure I understand "its not IP at all".

GRE has outer IP header + GRE header with the key to identify the flow.
The key is effectively the hash for the flow.


So if we are talking about GRE, hash is indeed not calculated at all at
the moment, right? 
Hash of the outer IP header of the src and dst IP can be still 
calculated currently for GRE when the optional key is not present.



And I would say a natural first step for GRE is
actually adding a hash type that will support this protocol.

For GRE and NVGRE GRE_header.key as the flow/hash identifier should 
work without inner header hash.
Older version of the GRE doesn't have key, so inner header hash is 
useful.


Yes, indeed.
The old GRE does not have keys such as flow id. Even with the new GRE, 
we have no chance to use optional fields,
because we have connected with various operators, and the most basic 
fields can get the best compatibility.


Thanks.




How about doing that? It seems like this should be a small step
and completely uncontroversial.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v13] virtio-net: support the virtqueue coalescing moderation

2023-03-22 Thread Heng Qi




在 2023/3/23 上午1:02, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, March 22, 2023 12:53 PM

On Wed, Mar 22, 2023 at 04:49:58PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, March 22, 2023 12:47 PM

I agree with Cornelia here. Yes if devices do not want to trust
drivers then they will validate input but what exactly happens then is

currently up to device.

If we want to try and specify devices in all cases of out of spec
input that's a big project, certainly doable but I would rather not
connect it to this, rather boutique, feature.

Both of your and Cornelia's comment is abstract to me.
We cannot change past.

But we can make sure things are consistent. Currently we don't describe device
behaviour if driver is out of spec and I see 0 reasons to start doing it with
coalescing commands specifically.


For the new command of interest here, hen driver supplied incorrect values,

the device will return error.

It might be easier for device to just set NEEDS_RESET and stop responding.

This approach of treating all errors as a fatal category is completely the 
opposite of making the device and driver resilient to (recoverable) errors.
We shouldn't go this route.
Different discussion...


For
a hypervisor implementation that's often better than returning error since
device state is then preserved making things easier to debug.


How to implement is upto the device to figure out.


what to do is also up to the device.

Previously error code as not returned hence new command cannot return the error 
code is going backward.

Returning the failure code is a way to indicate that the driver had a 
recoverable error.


I agree with you. Part of the specification [1] covered something we're 
talking about, e.g. if an untrusted driver sends a disabled vq, the 
device returns an error:


[1] +The device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and 
VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the 
designated virtqueue is disabled.


Maybe we should modify [1] to:

"The device MUST respond to VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and 
VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET commands with VIRTIO_NET_ERR if the 
designated \field{vqn} is not the virtqueue number of an enabled 
transmit or receive virtqueue."



Thanks!





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash

2023-03-22 Thread Heng Qi




在 2023/3/23 上午11:13, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, March 22, 2023 12:42 PM
  

Yes. But my point is this. Some flows can be IPv4 others IPv6.
Do you see a way to have a key that will result in a symmetrical hash for both
IPv4 and IPv6? Can you give an example please?


Heng,

Is that the requirement to have two completely different flows (ipv, ipv6) to 
steer to a single RQ?


Michael should be talking about whether there is a symmetric key that 
can serve both IPv4 and IPv6, so that they can respectively achieve the 
purpose of symmetric hashing.
I am not an expert in hashing, but this article [1] deduces a symmetric 
hash key, and I think it should be possible to deduce a specific key to 
meet this requirement.


Or we should support XOR hashing.

Thanks.


The requirement, what I understood, is between two directions for a flow to 
result in a single hash value.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash

2023-03-22 Thread Heng Qi




在 2023/3/23 上午11:58, Heng Qi 写道:



在 2023/3/23 上午11:13, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, March 22, 2023 12:42 PM
Yes. But my point is this. Some flows can be IPv4 others IPv6.
Do you see a way to have a key that will result in a symmetrical 
hash for both

IPv4 and IPv6? Can you give an example please?


Heng,

Is that the requirement to have two completely different flows (ipv, 
ipv6) to steer to a single RQ?


Michael should be talking about whether there is a symmetric key that 
can serve both IPv4 and IPv6, so that they can respectively achieve 
the purpose of symmetric hashing.
I am not an expert in hashing, but this article [1] deduces a 
symmetric hash key, and I think it should be possible to deduce a 
specific key to meet this requirement.


Sorry for the lost link:
[1] https://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf



Or we should support XOR hashing.

Thanks.

The requirement, what I understood, is between two directions for a 
flow to result in a single hash value.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-30 Thread Heng Qi




在 2023/3/21 上午3:45, Michael S. Tsirkin 写道:

On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote:

On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:


在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:

On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:


在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:

在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?

Let's simplify some details and take a fresh look at two different
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).

1. In Scenario1, we can improve the processing performance of the same flow
by implementing inner symmetric hashing.

This is because even though client1 and client2 communicate bidirectionally
through the same flow, their data may pass

through and be encapsulated by different tunnels, resulting in the same flow
being hashed to different queues and processed by different CPUs.

To ensure consistency and optimized processing, we need to parse out the
inner header and compute a symmetric hash on it using a special rss key.

Sorry for not mentioning the inner symmetric hash before, in order to
prevent the introduction of more concepts, but it is indeed a kind of inner
hash.

If parts of a flow go through different tunnels won't this cause
reordering at the network level? Why is it so important to prevent it at
the nic then?  Or, since you are stressing symmetric hash, are you
talking about TX and RX side going through different tunnels?

Yes, the directions client1->client2 and client2->client1 may go through
different tunnels.
Using inner symmetric hashing can satisfy the same CPU to process two
directions of the same flow to improve performance.

Well sure but ... are you just doing forwarding or inner processing too?

When there is an inner hash, there is no forwarding anymore.


If forwarding why do you care about matching TX and RX queues? If e2e

In fact, we are just matching on the same rx queue. The network topology
is roughly as follows. The processing host will receive the packets
sent from client1 and client2 respectively, then make some action judgments,
and return them to client2 and client1 respectively.

client1   client2
| |
|  __ |
+->| tunnel |<+
   ||
  |  |
  |  |
  |  |
  v  v
+-+
| processing host |
+-+

Thanks.

monotoring host would be a better term


Sure.

I'm so sorry I didn't realize I missed this until I checked my emails. 😮 :(





processing can't you just store the incoming hash in the flow and reuse
on TX? This is what Linux is doing...






2. In Scenario2 with GRE, the lack of outer transport headers means that
flows between multiple communication pairs encapsulated by the same tunnel

will all be hashed to the same queue. To address this, we need to implement
inner hashing to improve the performance of RSS. By parsing and calculating

the inner hash, different flows can be hashed to different queues.

Thanks.



Well 2 is at least inexact, there's flowID there. It's just 8 bit

We use the most basic GRE header fields (not NVGRE), not even optional
fields.
There is also no flow id in the GRE header, should you be referring to
NVGRE?

Thanks.


so not sufficient if there are more than 512 queues

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-30 Thread Heng Qi




在 2023/3/21 上午3:48, Michael S. Tsirkin 写道:

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:

We use the most basic GRE header fields (not NVGRE), not even optional
fields.

I'd say yes, the most convincing usecase is with legacy GRE.


Yes. But we still have a strong need for VXLAN and GENEVE to do 
symmetric hashing. Please consider this.



Given that, do you need the rest of protocols there?


I would say that I checked the current tunneling protocols used for 
overlay networks and their respective RFC versions compared to each other.


They are:

1. GRE_rfc2784 :This protocol is only specified for IPv4 and used as 
either the payload or delivery protocol.

    link : https://datatracker.ietf.org/doc/rfc2784/

2. GRE_rfc2890: This protocol describes extensions by which two fields, 
Key and Sequence Number, can be optionally carried in the GRE Header.

    link: https://www.rfc-editor.org/rfc/rfc2890

3. GRE_rfc7676: IPv6 Support for Generic Routing Encapsulation (GRE). 
This protocol is specified for IPv6 and used as either the payload or 
delivery protocol.
    Note that this does not change the GRE header format or any 
behaviors specified by RFC 2784 or RFC 2890.

    link: https://datatracker.ietf.org/doc/rfc7676/

4. GRE-in-UDP: GRE-in-UDP Encapsulation. This specifies a method of 
encapsulating network protocol packets within GRE and UDP headers.
    This GRE-in-UDP encapsulation allows the UDP source port field to 
be used as an entropy field. This protocol is specified for IPv4 and 
IPv6, and used as either the payload or delivery protocol.

    link: https://www.rfc-editor.org/rfc/rfc8086

5. VXLAN: Virtual eXtensible Local Area Network.
    link: https://datatracker.ietf.org/doc/rfc7348/

6. VXLAN-GPE: Generic Protocol Extension for VXLAN. This protocol 
describes extending Virtual eXtensible Local Area Network (VXLAN) via 
changes to the VXLAN header.

    link: https://www.ietf.org/archive/id/draft-ietf-nvo3-vxlan-gpe-12.txt

7. GENEVE: Generic Network Virtualization Encapsulation.
    link: https://datatracker.ietf.org/doc/rfc8926/

8. IPIP: IP Encapsulation within IP.
    link: https://www.rfc-editor.org/rfc/rfc2003

9. NVGRE: Network Virtualization Using Generic Routing Encapsulation
    link: https://www.rfc-editor.org/rfc/rfc7637.html

10. STT: Stateless Transport Tunneling. STT is particularly useful when 
some tunnel endpoints are in end-systems, as it utilizes the 
capabilities of the network interface card to improve performance.

  link: https://www.ietf.org/archive/id/draft-davie-stt-08.txt

Among them, GRE_rfc2784, VXLAN and GENEVE are our internal requirements 
for inner header hashing.

GRE_rfc2784 requires RSS hashing to different queues.
For the monitoring scenario I mentioned, VXLAN or GRE_rfc2890 also needs 
to use inner symmetric hashing.


I know you mean to want this feature to only support GRE_rfc2784, since 
it's the most convincing for RSS.

But RSS hashes packets to different queues for different streams.
For the same flow, it needs to hash it to the same queue.
So this doesn't distort the role of RSS, and I believe that for modern 
protocols like VXLAN and others, inner symmetric hashing is still a 
common requirement for other vendors using virtio devices.


So, can we make this feature support all the protocols I have checked 
above, so that vendors can choose to support the protocols they want. 
And this can avoid the addition of new tunnel protocols

in the near future as much as possible.

Do you think it's ok?

Again: I'm so sorry I didn't realize I missed this until I checked my 
emails. 🙁😮



We can start with just legacy GRE (think about including IPv6 or not).
Given how narrow this usecase is, I'd be fine with focusing
just on this, and addressing more protocols down the road
with something programmable like BPF. WDYT?




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-04-10 Thread Heng Qi




在 2023/4/8 下午6:29, Michael S. Tsirkin 写道:

On Thu, Mar 30, 2023 at 08:37:21PM +0800, Heng Qi wrote:


在 2023/3/21 上午3:48, Michael S. Tsirkin 写道:

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:

We use the most basic GRE header fields (not NVGRE), not even optional
fields.

I'd say yes, the most convincing usecase is with legacy GRE.

Yes. But we still have a strong need for VXLAN and GENEVE to do symmetric
hashing. Please consider this.

Using a specific key seems fragile though in that a different one is
needed for e.g. ipv4 and ipv6.  An issue with VXLAN and GENEVE, yes?


Yes.


Will support for XOR hashing address this sufficiently or is that not
acceptable to you? Or alternatively a modified Toeplitz, e.g. this


This is a very good opinion, I will want to follow up on this work and I 
have expressed in other threads.


Thanks.


https://inbox.dpdk.org/dev/20190731123040.gg4...@6wind.com/
suggests Mellanox supports that. WDYT?




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v12] virtio-net: support inner header hash

2023-04-11 Thread Heng Qi




在 2023/4/12 上午5:03, Parav Pandit 写道:



On 4/3/2023 12:58 AM, Heng Qi wrote:

To achieve this, the device can calculate a suitable hash based on 
the inner headers
of this flow, for example using the Toeplitz combined with a 
symmetric hash key.


I am not sure you need symmetric hash key. Toeplitz with symmetric 
hashing without the symmetric key is possible too.


So just mentioning it as a 'combined with symmetric hashing' is enough.


Yes, as discussed with Michael, we will also support XOR hashing or 
Toeplitz symmetric hashing or even both, I'm thinking of starting a 
draft in a separate thread to let us have initial discussions.


The statement here I will modify.



  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device 
Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -198,6 +202,7 @@ \subsection{Device configuration 
layout}\label{sec:Device Types / Network Device

  u8 rss_max_key_size;
  le16 rss_max_indirection_table_length;
  le32 supported_hash_types;
+    le32 supported_tunnel_hash_types;
  };
Given that a set command is added via cvq, it make sense to also do 
symetrric work to get it via a cvq.
This is also similar to the latest work for notification coalescing 
for VQ where get and set done using single channel = cvq.


Only set is given to match the existing RSS/HASH configuration methods. 
But we should really look ahead as you suggest.




Granted that RSS and other fields are done differently, but it was bit 
in the past.


Yes.



With that no need to define two fields at two different places in 
config area and also in cvq.


Just the new opcode is needed for GET and things will be fine.


Right.



+If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the device supports 
inner header hash and
+the driver can configure the inner header hash calculation for 
encapsulated packets \ref{sec:Device Types / Network Device / Device 
OperatiHn / Processing of Incoming Packets / Hash calculation for 
incoming packets / Tunnel/Encapsulated packet}
+by issuing the command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG from the 
class VIRTIO_NET_CTRL_MQ.
+The command sets \field{hash_tunnel_types} in the structure 
virtio_net_hash_tunnel_config.

+
+struct virtio_net_hash_tunnel_config {
+    le32 hash_tunnel_types;
+};
+

VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG_SET
and
VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG_GET


Will do.



+Filed \field{hash_tunnel_types} contains a bitmask of supported hash 
tunnel types as
+defined in \ref{sec:Device Types / Network Device / Device Operation 
/ Processing of Incoming Packets / Hash calculation for incoming 
packets / Supported/enabled hash tunnel types}.

+
+\subparagraph{Tunnel/Encapsulated packet}
+\label{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash calculation for incoming 
packets / Tunnel/Encapsulated packet}

+
+A tunnel packet is encapsulated from the original packet based on 
the tunneling
+protocol (only a single level of encapsulation is currently 
supported). The
+encapsulated packet contains an outer header and an inner header, 
and the device

+calculates the hash over either the inner header or the outer header.
+
+If VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received 
encapsulated packet's
+outer header matches one of the supported \field{hash_tunnel_types}, 
the hash

+of the inner header is calculated.

s/one of the supported/one of the configured/
Because support comes from GET or config space area; out of which a 
smaller or equal subset of tunnel types are configured.


Yes, configured is obviously more accurate than supported.



+\devicenormative{\subparagraph}{Inner Header Hash}{Device Types / 
Network Device / Device Operation / Control Virtqueue / Inner Header 
Hash}

+
+The device MUST calculate the outer header hash if the received 
encapsulated packet has an encapsulation type not in 
\field{supported_tunnel_hash_types}.

+

Since the configured set can be smaller, a better reword is:
The device MUST calculate the hash from the outer header if the 
received encapsulated packet type is not matching from hash_tunnel_types.


Will modify.



+The device MUST respond to the VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG 
command with VIRTIO_NET_ERR if the device
+received an unrecognized or unsupported VIRTIO_NET_HASH_TUNNEL_TYPE_ 
flag.

+
+Upon reset, the device MUST initialize \field{hash_tunnel_type} to 0.
+
+\drivernormative{\subparagraph}{Inner Header Hash}{Device Types / 
Network Device / Device Operation / Control Virtqueue / Inner Header 
Hash}

+
+The driver MUST have negotiated the feature VIRTIO_NET_F_HASH_TUNNEL 
when issuing the command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG.

+
+The driver MUST NOT set any VIRTIO_NET_HASH_TUNNEL_TYPE_ flags that 
are not supported by the device.

+
  \paragraph{Hash reporting for incoming packets}
  \label{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets 

Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v12] virtio-net: support inner header hash

2023-04-13 Thread Heng Qi




在 2023/4/14 上午5:43, Michael S. Tsirkin 写道:

On Thu, Apr 13, 2023 at 07:03:26PM +0800, Heng Qi wrote:

   For example, when the packets of certain
+tunnels are spread across multiple receive queues, these receive
queues may have an unbalanced
+amount of packets. This can cause a specific receive queue to
become full, resulting in packet loss.


We have many places that can lead to packet dropping. For example, the
automatic steering is best effort. I tend to avoid mentioning things
like this.

Ok. And Michael what do you think about this?


I think this text did not do a great job explaining the
security aspect. Here's a better, shorter explanation:

It is often an expectation of users that a tunnel isolates the external
network from the internal one. By completely ignoring entropy in the
external header and replacing it with entropy from the internal header,
for hash calculations, this expectation might be violated to a certain
extent, depending on how the hash is used. When the hash use is limited
to RSS queue selection, the effect will likely be limited to ability of
users inside the tunnel to cause packet drops in multiple queues (as
opposed to a single queue without the feature).


Sure. Will do  in the v13.






+
+Possible mitigations:
+\begin{itemize}
+\item Use a tool with good forwarding performance to keep the
receive queue from filling up.
+\item If the QoS is unavailable, the driver can set
\field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
+  to disable inner header hash for encapsulated packets.
+\item Choose a hash key that can avoid queue collisions.
+\item Perform appropriate QoS before packets consume the receive
buffers of the receive queues.
+\end{itemize}
+
+The limitations mentioned above exist with/without the inner header
hash.


This conflicts with the tile "Tunnel QoS limitation" which readers may
think it happens only for tunnel.

Perhaps a "QoS Advices" is better?

Plural of "advice" is "advice" not "advices".


My fault.😅



This advice is somewhat bogus though.

The point I keep trying to make is that this:

Choose a hash key that can avoid queue collisions.

is impossible with the feature and possible without.


I don't think so, the outer headers also has corresponding entropy for 
different streams.


Thanks.


This was the whole reason I asked for a security
considerations sections.



Thanks!


Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines:
https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v12] virtio-net: support inner header hash

2023-04-13 Thread Heng Qi




在 2023/4/14 上午11:10, Jason Wang 写道:

On Fri, Apr 14, 2023 at 5:46 AM Michael S. Tsirkin  wrote:

On Thu, Apr 13, 2023 at 07:03:26PM +0800, Heng Qi wrote:

   For example, when the packets of certain
+tunnels are spread across multiple receive queues, these receive
queues may have an unbalanced
+amount of packets. This can cause a specific receive queue to
become full, resulting in packet loss.


We have many places that can lead to packet dropping. For example, the
automatic steering is best effort. I tend to avoid mentioning things
like this.

Ok. And Michael what do you think about this?


I think this text did not do a great job explaining the
security aspect. Here's a better, shorter explanation:

 It is often an expectation of users that a tunnel isolates the external
 network from the internal one. By completely ignoring entropy in the
 external header and replacing it with entropy from the internal header,
 for hash calculations, this expectation might be violated to a certain
 extent, depending on how the hash is used. When the hash use is limited
 to RSS queue selection, the effect will likely be limited to ability of
 users inside the tunnel to cause packet drops in multiple queues (as
 opposed to a single queue without the feature).

And this is only for GRE-in-UDP? This makes me think if we should add
GRE support for the outer header like:

https://docs.napatech.com/r/Feature-Set-N-ANL9/Hash-Key-Type-10-3-Tuple-GREv0


I think this is for tunneling protocols with specific flow fields, such 
as GRE:key filed, NVGRE:FLOWID filed.


This requires us to make a requirement when calculating the hash of the 
tunnels when F_TUNNEL_HASH is not negotiated. It's a new work.


Thanks.



Thanks





+
+Possible mitigations:
+\begin{itemize}
+\item Use a tool with good forwarding performance to keep the
receive queue from filling up.
+\item If the QoS is unavailable, the driver can set
\field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
+  to disable inner header hash for encapsulated packets.
+\item Choose a hash key that can avoid queue collisions.
+\item Perform appropriate QoS before packets consume the receive
buffers of the receive queues.
+\end{itemize}
+
+The limitations mentioned above exist with/without the inner header
hash.


This conflicts with the tile "Tunnel QoS limitation" which readers may
think it happens only for tunnel.

Perhaps a "QoS Advices" is better?

Plural of "advice" is "advice" not "advices".

This advice is somewhat bogus though.

The point I keep trying to make is that this:

 Choose a hash key that can avoid queue collisions.

is impossible with the feature and possible without.
This was the whole reason I asked for a security
considerations sections.

--
MST



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v13] virtio-net: support inner header hash

2023-04-26 Thread Heng Qi




在 2023/4/26 上午4:28, Parav Pandit 写道:



On 4/23/2023 3:35 AM, Heng Qi wrote:
    \subsubsection{Legacy Interface: Feature bits}\label{sec:Device 
Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -198,6 +202,7 @@ \subsection{Device configuration 
layout}\label{sec:Device Types / Network Device

  u8 rss_max_key_size;
  le16 rss_max_indirection_table_length;
  le32 supported_hash_types;
+    le32 supported_tunnel_hash_types;
  };
In v12 I was asking this to move to above field from the config area 
to the GET command in comment [1] as,


"With that no need to define two fields at two different places in 
config area and also in cvq."


I'm not sure if this is sufficiently motivated, RSS also has 
supports_hash_types in config space.


We don't actually need cvq and config to sync on 
supported_tunnel_hash_types, since it doesn't need to change (meaning 
supported_tunnel_hash_types doesn't send configuration change 
notifications).




I am sorry if that was not clear enough.

[1] 
https://lore.kernel.org/virtio-dev/569cbaf9-f1fb-0e1f-a2ef-b1d7cd7db...@nvidia.com/



  \subparagraph{Supported/enabled hash types}
  \label{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash calculation for incoming 
packets / Supported/enabled hash types}
+This paragraph relies on definitions from \hyperref[intro:IP]{[IP]}, 
\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.

  Hash types applicable for IPv4 packets:
  \begin{lstlisting}
  #define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
@@ -980,6 +993,152 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  (see \ref{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Hash calculation for incoming 
packets / IPv6 packets without extension header}).

  \end{itemize}
  +\paragraph{Inner Header Hash}
+\label{sec:Device Types / Network Device / Device Operation / 
Processing of Incoming Packets / Inner Header Hash}

+
+If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the device supports 
inner header hash and the driver can send
+commands VIRTIO_NET_CTRL_TUNNEL_HASH_SET and 
VIRTIO_NET_CTRL_TUNNEL_HASH_GET for the inner header hash configuration.

+
+struct virtio_net_hash_tunnel_config {

Please move field from the config struct to here. Both are RO fields.

le32 supported_hash_tunnel_types;

+    le32 hash_tunnel_types;
+};
+
+#define VIRTIO_NET_CTRL_TUNNEL_HASH 7
+ #define VIRTIO_NET_CTRL_TUNNEL_HASH_SET 0
+ #define VIRTIO_NET_CTRL_TUNNEL_HASH_GET 1
+
+Filed \field{hash_tunnel_types} contains a bitmask of configured 
hash tunnel types as
+defined in \ref{sec:Device Types / Network Device / Device Operation 
/ Processing of Incoming Packets / Hash calculation for incoming 
packets / Supported/enabled hash tunnel types}.

+
+The class VIRTIO_NET_CTRL_TUNNEL_HASH has the following commands:
+\begin{itemize}
+\item VIRTIO_NET_CTRL_TUNNEL_HASH_SET: set the 
\field{hash_tunnel_types} to configure the inner header hash 
calculation for the device.
+\item VIRTIO_NET_CTRL_TUNNEL_HASH_GET: get the 
\field{hash_tunnel_types} from the device.

+\end{itemize}
+
+For the command VIRTIO_NET_CTRL_TUNNEL_HASH_SET, the structure 
virtio_net_hash_tunnel_config is write-only for the driver.
+For the command VIRTIO_NET_CTRL_TUNNEL_HASH_GET, the structure 
virtio_net_hash_tunnel_config is read-only for the driver.

+
You need to split the structures to two, one for get and one for set 
in above description as get and set contains different fields.

+
+If VIRTIO_NET_HASH_TUNNEL_TYPE_NONE is set or the encapsulation type 
is not included in \field{hash_tunnel_types},
+the hash of the outer header is calculated for the received 
encapsulated packet.

+
+
+For scenarios with sufficient external entropy or no internal 
hashing requirements, inner header hash may not be needed:
+A tunnel is often expected to isolate the external network from the 
internal one. By completely ignoring entropy
+in the external header and replacing it with entropy from the 
internal header, for hash calculations, this expectation

You wanted to say inner here like rest of the places.

s/internal header/inner header


I want to make the 'external' and 'internal' correspond, but avoid the 
internal header, and use a unified 'inner header' is also reasonable.:)





+The driver MUST NOT set any VIRTIO_NET_HASH_TUNNEL_TYPE_ flags that 
are not supported by the device.

Multiple flags so,

s/flags that are/flags which are/


Will fix.

Thanks!


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v13] virtio-net: support inner header hash

2023-04-26 Thread Heng Qi




在 2023/4/26 下午10:24, Parav Pandit 写道:



From: Heng Qi 
Sent: Wednesday, April 26, 2023 10:04 AM
Yes, but that seems like a tiny cost, and the cvq command-related structure is
much simpler.

Current structure size is 24 bytes.
This size becomes multiplier with device count scale to be always available and 
rarely changes.

As we add new features such device capabilities grow making the multiplier 
bigger.
For example
a. flow steering capabilities (how many flows, what mask, supported protocols, 
generic options)
b. hds capabilities
c. counter capabilities (histogram based, which error counters supported, etc)
d. which new type of tx vq improvements supported.
e. hw gro context count supported

May be more..

Depending on the container/VM size certain capabilities may change from device 
to device.
Hence it is hard to deduplicate them at device level.


This makes sense. In general, we should be careful about adding things 
to the device space unless the benefit is non-trivial.


Thanks.



Therefore, ability to query them over a non_always_available transport is 
preferred choice from the device.

A driver may choose to cache it if its being frequently accessed or ask device 
when needed.
Even when it's cached by driver, it is coming from the component that doesn’t 
have transport level timeouts associated with it.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash

2023-04-26 Thread Heng Qi




在 2023/4/26 下午10:48, Michael S. Tsirkin 写道:

On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote:

This does not mean that every device needs to implement and support all of
these, they can choose to support some protocols they want.

I add these because we have scale application scenarios for modern protocols
VXLAN-GPE/GENEVE:

+\item In scenarios where the same flow passing through different tunnels is 
expected to be received in the same queue,
+  warm caches, lessing locking, etc. are optimized to obtain receiving 
performance.


Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has a little crossover.

Thanks.

But VXLAN-GPE/GENEVE can use source port for entropy.

It is recommended that the UDP source port number
 be calculated using a hash of fields from the inner packet

That is best because
it allows end to end control and is protocol agnostic.


Yes. I agree with this, I don't think we have an argument on this point 
right now.:)


For VXLAN-GPE/GENEVE or other modern tunneling protocols, we have to 
deal with

scenarios where the same flow passes through different tunnels.

Having them hashed to the same rx queue, is hard to do via outer headers.


All that is missing is symmetric Toepliz and all is well?


The scenarios above or in the commit log also require inner headers.


Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash

2023-05-05 Thread Heng Qi
On Thu, Apr 27, 2023 at 01:13:29PM -0400, Michael S. Tsirkin wrote:
> On Thu, Apr 27, 2023 at 10:28:29AM +0800, Heng Qi wrote:
> > 
> > 
> > 在 2023/4/26 下午10:48, Michael S. Tsirkin 写道:
> > > On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote:
> > > > This does not mean that every device needs to implement and support all 
> > > > of
> > > > these, they can choose to support some protocols they want.
> > > > 
> > > > I add these because we have scale application scenarios for modern 
> > > > protocols
> > > > VXLAN-GPE/GENEVE:
> > > > 
> > > > +\item In scenarios where the same flow passing through different 
> > > > tunnels is expected to be received in the same queue,
> > > > +  warm caches, lessing locking, etc. are optimized to obtain 
> > > > receiving performance.
> > > > 
> > > > 
> > > > Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has a little 
> > > > crossover.
> > > > 
> > > > Thanks.
> > > But VXLAN-GPE/GENEVE can use source port for entropy.
> > > 
> > >   It is recommended that the UDP source port number
> > >be calculated using a hash of fields from the inner packet
> > > 
> > > That is best because
> > > it allows end to end control and is protocol agnostic.
> > 
> > Yes. I agree with this, I don't think we have an argument on this point
> > right now.:)
> > 
> > For VXLAN-GPE/GENEVE or other modern tunneling protocols, we have to deal
> > with
> > scenarios where the same flow passes through different tunnels.
> > 
> > Having them hashed to the same rx queue, is hard to do via outer headers.
> > > All that is missing is symmetric Toepliz and all is well?
> > 
> > The scenarios above or in the commit log also require inner headers.
> 
> Hmm I am not sure I get it 100%.
> Could you show an example with inner header hash in the port #,
> hash is symmetric, and you still have trouble?
> 
> 
> It kinds of sounds like not enough entropy is not the problem
> at this point.

Sorry for the late reply. :)

For modern tunneling protocols, yes.

> You now want to drop everything from the header
> except the UDP source port. Is that a fair summary?
> 

For example, for the same flow passing through different VXLAN tunnels,
packets in this flow have the same inner header and different outer
headers. Sometimes these packets of the flow need to be hashed to the
same rxq, then we can use the inner header as the hash input.

Thanks!

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash

2023-05-09 Thread Heng Qi




在 2023/5/5 下午10:56, Michael S. Tsirkin 写道:

On Fri, May 05, 2023 at 09:51:15PM +0800, Heng Qi wrote:

On Thu, Apr 27, 2023 at 01:13:29PM -0400, Michael S. Tsirkin wrote:

On Thu, Apr 27, 2023 at 10:28:29AM +0800, Heng Qi wrote:


在 2023/4/26 下午10:48, Michael S. Tsirkin 写道:

On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote:

This does not mean that every device needs to implement and support all of
these, they can choose to support some protocols they want.

I add these because we have scale application scenarios for modern protocols
VXLAN-GPE/GENEVE:

+\item In scenarios where the same flow passing through different tunnels is 
expected to be received in the same queue,
+  warm caches, lessing locking, etc. are optimized to obtain receiving 
performance.


Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has a little crossover.

Thanks.

But VXLAN-GPE/GENEVE can use source port for entropy.

It is recommended that the UDP source port number
 be calculated using a hash of fields from the inner packet

That is best because
it allows end to end control and is protocol agnostic.

Yes. I agree with this, I don't think we have an argument on this point
right now.:)

For VXLAN-GPE/GENEVE or other modern tunneling protocols, we have to deal
with
scenarios where the same flow passes through different tunnels.

Having them hashed to the same rx queue, is hard to do via outer headers.

All that is missing is symmetric Toepliz and all is well?

The scenarios above or in the commit log also require inner headers.

Hmm I am not sure I get it 100%.
Could you show an example with inner header hash in the port #,
hash is symmetric, and you still have trouble?


It kinds of sounds like not enough entropy is not the problem
at this point.

Sorry for the late reply. :)

For modern tunneling protocols, yes.


You now want to drop everything from the header
except the UDP source port. Is that a fair summary?


For example, for the same flow passing through different VXLAN tunnels,
packets in this flow have the same inner header and different outer
headers. Sometimes these packets of the flow need to be hashed to the
same rxq, then we can use the inner header as the hash input.

Thanks!

So, they will have the same source port yes?


Yes. The outer source port can be calculated using the 5-tuple of the 
original packet,
and the outer ports are the same but the outer IPs are different after 
different directions of the same flow pass through different tunnels.



Any way to use that


We use it in monitoring, firewall and other scenarios.


so we don't depend on a specific protocol?


Yes, selected tunneling protocols can be used in this scenario like this.

Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash

2023-05-10 Thread Heng Qi




在 2023/5/9 下午11:15, Michael S. Tsirkin 写道:

On Tue, May 09, 2023 at 10:22:19PM +0800, Heng Qi wrote:


在 2023/5/5 下午10:56, Michael S. Tsirkin 写道:

On Fri, May 05, 2023 at 09:51:15PM +0800, Heng Qi wrote:

On Thu, Apr 27, 2023 at 01:13:29PM -0400, Michael S. Tsirkin wrote:

On Thu, Apr 27, 2023 at 10:28:29AM +0800, Heng Qi wrote:

在 2023/4/26 下午10:48, Michael S. Tsirkin 写道:

On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote:

This does not mean that every device needs to implement and support all of
these, they can choose to support some protocols they want.

I add these because we have scale application scenarios for modern protocols
VXLAN-GPE/GENEVE:

+\item In scenarios where the same flow passing through different tunnels is 
expected to be received in the same queue,
+  warm caches, lessing locking, etc. are optimized to obtain receiving 
performance.


Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has a little crossover.

Thanks.

But VXLAN-GPE/GENEVE can use source port for entropy.

It is recommended that the UDP source port number
 be calculated using a hash of fields from the inner packet

That is best because
it allows end to end control and is protocol agnostic.

Yes. I agree with this, I don't think we have an argument on this point
right now.:)

For VXLAN-GPE/GENEVE or other modern tunneling protocols, we have to deal
with
scenarios where the same flow passes through different tunnels.

Having them hashed to the same rx queue, is hard to do via outer headers.

All that is missing is symmetric Toepliz and all is well?

The scenarios above or in the commit log also require inner headers.

Hmm I am not sure I get it 100%.
Could you show an example with inner header hash in the port #,
hash is symmetric, and you still have trouble?


It kinds of sounds like not enough entropy is not the problem
at this point.

Sorry for the late reply. :)

For modern tunneling protocols, yes.


You now want to drop everything from the header
except the UDP source port. Is that a fair summary?


For example, for the same flow passing through different VXLAN tunnels,
packets in this flow have the same inner header and different outer
headers. Sometimes these packets of the flow need to be hashed to the
same rxq, then we can use the inner header as the hash input.

Thanks!

So, they will have the same source port yes?

Yes. The outer source port can be calculated using the 5-tuple of the
original packet,
and the outer ports are the same but the outer IPs are different after
different directions of the same flow pass through different tunnels.

Any way to use that

We use it in monitoring, firewall and other scenarios.


so we don't depend on a specific protocol?

Yes, selected tunneling protocols can be used in this scenario like this.

Thanks.


No, the question was - can we generalize this somehow then?
For example, a flag to ignore source IP when hashing?
Or maybe just for UDP packets?


1. I think the common solution is based on the inner header, so that 
GRE/IPIP tunnels can also enjoy inner symmetric hashing.


2. The VXLAN spec does not show that the outer source port in both 
directions of the same flow must be the same [1]
(although the outer source port is calculated based on the consistent 
hash in the kernel. The consistent hash will sort the five-tuple before 
calculating hashing),
but it is best not to assume that consistent hashing is used in all 
VXLAN implementations. The GENEVE spec uses "SHOUlD"[2].


3. How should we generalize? The device uses a feature to advertise all 
the tunnel types it supports, and hashes these tunnel types using the 
outer source port,
and then we still have to give the specific tunneling protocols 
supported by the device, just like we do now.


[1] "Source Port: It is recommended that the UDP source port number be 
calculated using a hash of fields from the inner packet -- one example
being a hash of the inner Ethernet frame's headers. This is to enable a 
level of entropy for the ECMP/load-balancing of the VM-to-VM traffic across
the VXLAN overlay. When calculating the UDP source port number in this 
manner, it is RECOMMENDED that the value be in the dynamic/private

port range 49152-65535 [RFC6335] "

[2] "Source Port: A source port selected by the originating tunnel 
endpoint. This source port SHOULD be the same for all packets belonging to a
single encapsulated flow to prevent reordering due to the use of 
different paths. To encourage an even distribution of flows across 
multiple links,
the source port SHOULD be calculated using a hash of the encapsulated 
packet headers using, for example, a traditional 5-tuple. Since the port
represents a flow identifier rather than a true UDP connection, the 
entire 16-bit range MAY be used to maximize entropy. In addition to 
setting the
source port, for IPv6, the flow label MAY also be used for providing 
entropy. For an example of using the IP

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash

2023-05-11 Thread Heng Qi
On Thu, May 11, 2023 at 02:22:12AM -0400, Michael S. Tsirkin wrote:
> On Wed, May 10, 2023 at 05:15:37PM +0800, Heng Qi wrote:
> > 
> > 
> > 在 2023/5/9 下午11:15, Michael S. Tsirkin 写道:
> > > On Tue, May 09, 2023 at 10:22:19PM +0800, Heng Qi wrote:
> > > > 
> > > > 在 2023/5/5 下午10:56, Michael S. Tsirkin 写道:
> > > > > On Fri, May 05, 2023 at 09:51:15PM +0800, Heng Qi wrote:
> > > > > > On Thu, Apr 27, 2023 at 01:13:29PM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Apr 27, 2023 at 10:28:29AM +0800, Heng Qi wrote:
> > > > > > > > 在 2023/4/26 下午10:48, Michael S. Tsirkin 写道:
> > > > > > > > > On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote:
> > > > > > > > > > This does not mean that every device needs to implement and 
> > > > > > > > > > support all of
> > > > > > > > > > these, they can choose to support some protocols they want.
> > > > > > > > > > 
> > > > > > > > > > I add these because we have scale application scenarios for 
> > > > > > > > > > modern protocols
> > > > > > > > > > VXLAN-GPE/GENEVE:
> > > > > > > > > > 
> > > > > > > > > > +\item In scenarios where the same flow passing through 
> > > > > > > > > > different tunnels is expected to be received in the same 
> > > > > > > > > > queue,
> > > > > > > > > > +  warm caches, lessing locking, etc. are optimized to 
> > > > > > > > > > obtain receiving performance.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has a 
> > > > > > > > > > little crossover.
> > > > > > > > > > 
> > > > > > > > > > Thanks.
> > > > > > > > > But VXLAN-GPE/GENEVE can use source port for entropy.
> > > > > > > > > 
> > > > > > > > >   It is recommended that the UDP source port number
> > > > > > > > >be calculated using a hash of fields from the inner 
> > > > > > > > > packet
> > > > > > > > > 
> > > > > > > > > That is best because
> > > > > > > > > it allows end to end control and is protocol agnostic.
> > > > > > > > Yes. I agree with this, I don't think we have an argument on 
> > > > > > > > this point
> > > > > > > > right now.:)
> > > > > > > > 
> > > > > > > > For VXLAN-GPE/GENEVE or other modern tunneling protocols, we 
> > > > > > > > have to deal
> > > > > > > > with
> > > > > > > > scenarios where the same flow passes through different tunnels.
> > > > > > > > 
> > > > > > > > Having them hashed to the same rx queue, is hard to do via 
> > > > > > > > outer headers.
> > > > > > > > > All that is missing is symmetric Toepliz and all is well?
> > > > > > > > The scenarios above or in the commit log also require inner 
> > > > > > > > headers.
> > > > > > > Hmm I am not sure I get it 100%.
> > > > > > > Could you show an example with inner header hash in the port #,
> > > > > > > hash is symmetric, and you still have trouble?
> > > > > > > 
> > > > > > > 
> > > > > > > It kinds of sounds like not enough entropy is not the problem
> > > > > > > at this point.
> > > > > > Sorry for the late reply. :)
> > > > > > 
> > > > > > For modern tunneling protocols, yes.
> > > > > > 
> > > > > > > You now want to drop everything from the header
> > > > > > > except the UDP source port. Is that a fair summary?
> > > > > > > 
> > > > > > For example, for the same flow passing through different VXLAN 
> > > > > > tunnels,
> > > > > > packets in this flow have the same inner header

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash

2023-05-12 Thread Heng Qi
On Fri, May 12, 2023 at 02:54:34AM -0400, Michael S. Tsirkin wrote:
> On Fri, May 12, 2023 at 02:00:19PM +0800, Heng Qi wrote:
> > On Thu, May 11, 2023 at 02:22:12AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, May 10, 2023 at 05:15:37PM +0800, Heng Qi wrote:
> > > > 
> > > > 
> > > > 在 2023/5/9 下午11:15, Michael S. Tsirkin 写道:
> > > > > On Tue, May 09, 2023 at 10:22:19PM +0800, Heng Qi wrote:
> > > > > > 
> > > > > > 在 2023/5/5 下午10:56, Michael S. Tsirkin 写道:
> > > > > > > On Fri, May 05, 2023 at 09:51:15PM +0800, Heng Qi wrote:
> > > > > > > > On Thu, Apr 27, 2023 at 01:13:29PM -0400, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Apr 27, 2023 at 10:28:29AM +0800, Heng Qi wrote:
> > > > > > > > > > 在 2023/4/26 下午10:48, Michael S. Tsirkin 写道:
> > > > > > > > > > > On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote:
> > > > > > > > > > > > This does not mean that every device needs to implement 
> > > > > > > > > > > > and support all of
> > > > > > > > > > > > these, they can choose to support some protocols they 
> > > > > > > > > > > > want.
> > > > > > > > > > > > 
> > > > > > > > > > > > I add these because we have scale application scenarios 
> > > > > > > > > > > > for modern protocols
> > > > > > > > > > > > VXLAN-GPE/GENEVE:
> > > > > > > > > > > > 
> > > > > > > > > > > > +\item In scenarios where the same flow passing through 
> > > > > > > > > > > > different tunnels is expected to be received in the 
> > > > > > > > > > > > same queue,
> > > > > > > > > > > > +  warm caches, lessing locking, etc. are optimized 
> > > > > > > > > > > > to obtain receiving performance.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has 
> > > > > > > > > > > > a little crossover.
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks.
> > > > > > > > > > > But VXLAN-GPE/GENEVE can use source port for entropy.
> > > > > > > > > > > 
> > > > > > > > > > >   It is recommended that the UDP source port number
> > > > > > > > > > >be calculated using a hash of fields from the inner 
> > > > > > > > > > > packet
> > > > > > > > > > > 
> > > > > > > > > > > That is best because
> > > > > > > > > > > it allows end to end control and is protocol agnostic.
> > > > > > > > > > Yes. I agree with this, I don't think we have an argument 
> > > > > > > > > > on this point
> > > > > > > > > > right now.:)
> > > > > > > > > > 
> > > > > > > > > > For VXLAN-GPE/GENEVE or other modern tunneling protocols, 
> > > > > > > > > > we have to deal
> > > > > > > > > > with
> > > > > > > > > > scenarios where the same flow passes through different 
> > > > > > > > > > tunnels.
> > > > > > > > > > 
> > > > > > > > > > Having them hashed to the same rx queue, is hard to do via 
> > > > > > > > > > outer headers.
> > > > > > > > > > > All that is missing is symmetric Toepliz and all is well?
> > > > > > > > > > The scenarios above or in the commit log also require inner 
> > > > > > > > > > headers.
> > > > > > > > > Hmm I am not sure I get it 100%.
> > > > > > > > > Could you show an example with inner header hash in the port 
> > > > > > > > > #,
> > > > > &g

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v13] virtio-net: support inner header hash

2023-05-14 Thread Heng Qi




在 2023/5/12 下午7:27, Michael S. Tsirkin 写道:

On Fri, May 12, 2023 at 03:23:46PM +0800, Heng Qi wrote:

On Fri, May 12, 2023 at 02:54:34AM -0400, Michael S. Tsirkin wrote:

On Fri, May 12, 2023 at 02:00:19PM +0800, Heng Qi wrote:

On Thu, May 11, 2023 at 02:22:12AM -0400, Michael S. Tsirkin wrote:

On Wed, May 10, 2023 at 05:15:37PM +0800, Heng Qi wrote:


在 2023/5/9 下午11:15, Michael S. Tsirkin 写道:

On Tue, May 09, 2023 at 10:22:19PM +0800, Heng Qi wrote:

在 2023/5/5 下午10:56, Michael S. Tsirkin 写道:

On Fri, May 05, 2023 at 09:51:15PM +0800, Heng Qi wrote:

On Thu, Apr 27, 2023 at 01:13:29PM -0400, Michael S. Tsirkin wrote:

On Thu, Apr 27, 2023 at 10:28:29AM +0800, Heng Qi wrote:

在 2023/4/26 下午10:48, Michael S. Tsirkin 写道:

On Wed, Apr 26, 2023 at 10:14:30PM +0800, Heng Qi wrote:

This does not mean that every device needs to implement and support all of
these, they can choose to support some protocols they want.

I add these because we have scale application scenarios for modern protocols
VXLAN-GPE/GENEVE:

+\item In scenarios where the same flow passing through different tunnels is 
expected to be received in the same queue,
+  warm caches, lessing locking, etc. are optimized to obtain receiving 
performance.


Maybe the legacy GRE, VXLAN-GPE and GENEVE? But it has a little crossover.

Thanks.

But VXLAN-GPE/GENEVE can use source port for entropy.

It is recommended that the UDP source port number
 be calculated using a hash of fields from the inner packet

That is best because
it allows end to end control and is protocol agnostic.

Yes. I agree with this, I don't think we have an argument on this point
right now.:)

For VXLAN-GPE/GENEVE or other modern tunneling protocols, we have to deal
with
scenarios where the same flow passes through different tunnels.

Having them hashed to the same rx queue, is hard to do via outer headers.

All that is missing is symmetric Toepliz and all is well?

The scenarios above or in the commit log also require inner headers.

Hmm I am not sure I get it 100%.
Could you show an example with inner header hash in the port #,
hash is symmetric, and you still have trouble?


It kinds of sounds like not enough entropy is not the problem
at this point.

Sorry for the late reply. :)

For modern tunneling protocols, yes.


You now want to drop everything from the header
except the UDP source port. Is that a fair summary?


For example, for the same flow passing through different VXLAN tunnels,
packets in this flow have the same inner header and different outer
headers. Sometimes these packets of the flow need to be hashed to the
same rxq, then we can use the inner header as the hash input.

Thanks!

So, they will have the same source port yes?

Yes. The outer source port can be calculated using the 5-tuple of the
original packet,
and the outer ports are the same but the outer IPs are different after
different directions of the same flow pass through different tunnels.

Any way to use that

We use it in monitoring, firewall and other scenarios.


so we don't depend on a specific protocol?

Yes, selected tunneling protocols can be used in this scenario like this.

Thanks.


No, the question was - can we generalize this somehow then?
For example, a flag to ignore source IP when hashing?
Or maybe just for UDP packets?

1. I think the common solution is based on the inner header, so that
GRE/IPIP tunnels can also enjoy inner symmetric hashing.

2. The VXLAN spec does not show that the outer source port in both
directions of the same flow must be the same [1]
(although the outer source port is calculated based on the consistent hash
in the kernel. The consistent hash will sort the five-tuple before
calculating hashing),
but it is best not to assume that consistent hashing is used in all VXLAN
implementations.

I agree, best not to assume if it's not in the spec.
The requirement to hash two sides to same queue might
not be necessary for everyone though, right?

The outer source port is also not reliable when it needs to be hashed to
the same queue, but the inner header identifies a flow reliably and
universally.


The GENEVE spec uses "SHOUlD"[2].

What about other tunnels? Could you summarize please?

Sure.

The VXLAN spec[1] does not show that the outer source port in both
directions of the same flow must be the same.

VXLAN-GPE[2]("SHOULD")/GENEVE[3]("SHOULD")/GRE-in-UDP[4.1]/STT[5]
recommend that the outer source port of the same flow be calculated
based on the inner header hash and set to the same.

But the udp source port of GRE-in-UDP may be used in a scenario similar
to NAPT [4.2], where the udp source port is no longer used for entropy,
but for identifying different internal hosts. So using udp source port
does not identify the same stream. This is why using the inner header is
more general, since information about the original stream can reliably
identify a flow.

[1] "Source Port: It is recommended tha

[virtio-dev] [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-22 Thread Heng Qi


Currently, the VIRTIO_NET_F_GUEST_CSUM(NETIF_F_RXCSUM) feature of the virtio-net
driver conflicts with the loading of the XDP program, which is caused by the
problem described in [1][2], that is, XDP may cause errors in partial 
csumed-related
fields and resulting in packet dropping. rx CHECKSUM_PARTIAL mainly exists in 
the
virtualized environment, and its purpose is to save computing resource overhead.

The *goal* of this proposal is to enable the coexistence of XDP and 
VIRTIO_NET_F_GUEST_CSUM.

1. We need to understand why the device driver receives the rx CHECKSUM_PARTIAL 
packet.

Drivers related to the virtualized environment, such as 
virtio-net/veth/loopback,
etc., may receive partial csumed packets.

When the tx device finds that the destination rx device of the packet is
located on the same host, it is clear that the packet may not pass through
the physical link, so the tx device sends the packet with csum_{start, offset}
directly to the rx side to save computational resources without computing a 
fully csum
(depends on the specific implementation, some virtio-net backend devices are 
known to
behave like this currently). From [3], the stack trusts such packets.

However, veth still has NETIF_F_RXCSUM turned on when loading XDP. This may 
cause
packet dropping as [1][2] stated. But currently the veth community does not 
seem to
have reported such problems, can we guess that the coexistence of XDP and
rx CHECKSUM_PARTIAL has less negative impact?

2. About rx CHECKSUM_UNECESSARY:

We have just seen that in a virtualized environment a packet may flow between 
the
same host, so not computing the complete csum for the packet saves some cpu 
resources.

The purpose of the checksum is to verify that packets passing through the
physical link are correct. Of course, it is also reasonable to do a fully csum 
for
packets of the virtualized environment, which is exactly what we need.

rx CHECKSUM_UNECESSARY indicates that the packet has been fully checked,
that is, it is a credible packet. If such a packet is modified by the XDP 
program,
the user should recalculate the correct checksum using bpf_csum_diff() and
bpf_{l3,l4}_csum_replace().

Therefore, for those drivers(physical nic drivers?), such as atlantic/bnxt/mlx,
etc., XDP and NETIF_F_RXCSUM coexist, because their packets will be fully 
checked
at the tx side.

AWS's ena driver is also designed to be in this fully checksum mode
(we also mentioned below that a feature bit can be provided for virtio-net,
telling the sender that a fully checksum must be calculated to implement similar
behavior to other drivers), although it is in a virtualized environment.

3. To sum up:

It seems that only virtio-net sets XDP and VIRTIO_NET_F_GUEST_CSUM as mutually
exclusive, which may cause the following problems:

When XDP loads,

1) For packets that are fully checked by the sender, packets are marked as 
CHECKSUM_UNECESSARY
by the rx csum hw offloading.

virtio-net driver needs additional CPU resources to compute the checksum for 
any packet.

When testing with the following command in Aliyun ECS:
qperf dst_ip -lp 8989 -m 64K -t 20 tcp_bw
(mtu = 1500, dev layer GRO is on)

The csum-related overhead we tested on X86 is 11.7%, and on ARM is 15.8%.

2)
One of the main functions of the XDP prog is to be used as a monitoring and
firewall, etc., which means that the XDP prog may not modify the packet.
This is applicable to both rx CHECKSUM_PARTIAL and rx CHECKSUM_UNECESSARY,
but we ignore the rx csum hw offloading capability for both cases.

4. How we try to solve:

1) Add a feature bit to the virtio specification to tell the sender that a fully
csumed packet must be sent. Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM 
when this
feature bit is negotiated. (similar to ENA behavior)

2) Modify the current virtio-net driver

No longer filter the VIRTIO_NET_F_GUEST_CSUM feature in virtnet_xdp_set().
Then we can immediately get the ability from VIRTIO_NET_F_GUEST_CSUM and enjoy 
the software
CPU resources saved by rx csum hw offloading.
(This method is a bit rude)

5. Ending 

This is a proposal and does not represent a formal solution. Looking forward to 
feedback
from the community and exploring a possible/common solution to the problem 
described in
this proposal.

6. Quote

[1] 18ba58e1c234

virtio-net: fail XDP set if guest csum is negotiated

We don't support partial csumed packet since its metadata will be lost
or incorrect during XDP processing. So fail the XDP set if guest_csum
feature is negotiated.

[2] e59ff2c49ae1

virtio-net: disable guest csum during XDP set

We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
can receive partial csumed packets with metadata kept in the
vnet_hdr. This may have several side effects:

- It could be overridden by header adjustment, thus is might be not
  correct after XDP processing.
- There's no way to pass such metadata information through
  XDP_REDI

[virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-22 Thread Heng Qi
On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > 1) Add a feature bit to the virtio specification to tell the sender that a 
> > fully
> > csumed packet must be sent.
> 
> Who is the sender in this picture? The driver?

The device or the driver.

When the device is hw, the sender is more likely to be a device.
When the device is sw, the sender can be a device or a driver.

But in general, this feature is inclined to constrain the behavior of the 
device and
the driver from the receiving side.

For example: 
VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you must send 
me a fully csumed packet.

Then the specific implementation can be

(1) the sender sends a fully csumed packet;
(2) the receiver receives a CHECKSUM_PARTIAL packet, and the device helps 
calculate the fully csum
(because the two parties in the communication are located on the same host, 
the packet is trusted.).

In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver will no 
longer receive any packets marked CHECKSUM_PARTIAL.

Thanks.

> 
> -- 
> MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-23 Thread Heng Qi
On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote:
> On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote:
> > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > > > 1) Add a feature bit to the virtio specification to tell the sender 
> > > > that a fully
> > > > csumed packet must be sent.
> > > 
> > > Who is the sender in this picture? The driver?
> > 
> > The device or the driver.
> > 
> > When the device is hw, the sender is more likely to be a device.
> > When the device is sw, the sender can be a device or a driver.
> >
> > But in general, this feature is inclined to constrain the behavior of the 
> > device and
> > the driver from the receiving side.
> 
> Based on above I am guessing you are talking about driver getting
> packets from device, I wish you used terms from virtio spec.

Yes, I'm going to use the terminology of the virtio spec.

> 
> > For example: 
> > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you must 
> > send me a fully csumed packet.
> > 
> > Then the specific implementation can be
> > 
> > (1) the sender sends a fully csumed packet;
> > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device helps 
> > calculate the fully csum
> > (because the two parties in the communication are located on the same 
> > host, the packet is trusted.).
> > 
> > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver will 
> > no longer receive any packets marked CHECKSUM_PARTIAL.
> > 
> > Thanks.
> 
> This is what clearing VIRTIO_NET_F_GUEST_CSUM does.

Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can
receive a fully checksummed packet, we can no longer enjoy
the device's ability to validate the packet checksum. That is, the value
of \field{flags} in the virtio_net_hdr structure is set to 0, which means
that the packet received by the driver will not be marked as
VIRTIO_NET_HDR_F_DATA_VALID.

So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM).
If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the
driver a fully checksummed packet, and the packet is validated by the
device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID.

> 
> I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM
> disables all offloads but you want to keep some of them?
> 

No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is needed
in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated,
then the driver may always receive packets marked as
VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the
same time.

> Again please use virtio terminology not Linux. to help you out,
> in current linux, VIRTIO_NET_HDR_F_NEEDS_CSUM and VIRTIO_NET_HDR_F_DATA_VALID
> will set CHECKSUM_PARTIAL and CHECKSUM_UNNECESSARY respectively.
> 

Sure. Will do as you suggested.

Thanks.

> 
> -- 
> MST
> 
> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-23 Thread Heng Qi
On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote:
> On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote:
> > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote:
> > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > > > > > 1) Add a feature bit to the virtio specification to tell the sender 
> > > > > > that a fully
> > > > > > csumed packet must be sent.
> > > > > 
> > > > > Who is the sender in this picture? The driver?
> > > > 
> > > > The device or the driver.
> > > > 
> > > > When the device is hw, the sender is more likely to be a device.
> > > > When the device is sw, the sender can be a device or a driver.
> > > >
> > > > But in general, this feature is inclined to constrain the behavior of 
> > > > the device and
> > > > the driver from the receiving side.
> > > 
> > > Based on above I am guessing you are talking about driver getting
> > > packets from device, I wish you used terms from virtio spec.
> > 
> > Yes, I'm going to use the terminology of the virtio spec.
> > 
> > > 
> > > > For example: 
> > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that you 
> > > > must send me a fully csumed packet.
> > > > 
> > > > Then the specific implementation can be
> > > > 
> > > > (1) the sender sends a fully csumed packet;
> > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device 
> > > > helps calculate the fully csum
> > > > (because the two parties in the communication are located on the 
> > > > same host, the packet is trusted.).
> > > > 
> > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the driver 
> > > > will no longer receive any packets marked CHECKSUM_PARTIAL.
> > > > 
> > > > Thanks.
> > > 
> > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does.
> > 
> > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can
> > receive a fully checksummed packet, we can no longer enjoy
> > the device's ability to validate the packet checksum. That is, the value
> > of \field{flags} in the virtio_net_hdr structure is set to 0, which means
> > that the packet received by the driver will not be marked as
> > VIRTIO_NET_HDR_F_DATA_VALID.
> > 
> > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM).
> > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the
> > driver a fully checksummed packet, and the packet is validated by the
> > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID.
> > 
> > > 
> > > I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM
> > > disables all offloads but you want to keep some of them?
> > > 
> > 
> > No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is needed
> > in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated,
> > then the driver may always receive packets marked as
> > VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the
> > same time.
> 
> Makes no sense to me. VIRTIO_NET_F_GUEST_CSUM set already allows
> VIRTIO_NET_HDR_F_DATA_VALID:

We need to focus on what happens when the XDP program is loaded:

The current virtio-net needs to turn off the VIRTIO_NET_F_GUEST_CSUM
feature when loading XDP. The main reason for doing this is because
VIRTIO_NET_F_GUEST_CSUM allows to receive packets marked as
VIRTIO_NET_HDR_F_NEEDS_CSUM. Such packets are not compatible with
XDP programs, because we cannot guarantee that the csum_{start, offset}
fields are correct after XDP modifies the packets.

So in order for the driver to continue to enjoy the device's ability to
validate packets while XDP is loaded, we need a new feature to tell the
device not to deliver packets marked as VIRTIO_NET_HDR_F_NEEDS_CSUM (in
other words, the device can only deliver packets marked as
VIRTIO_NET_HDR_F_DATA_VALID).

Thanks.

> \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
>   VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags} can be
>   set: if so, the packet checksum at offset \field{csum_offset}
>   from \field{csum_start} and any preceding checksums
>   have been validated.  The checksum on the packet is incomplete and
>   if bit VIR

Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-24 Thread Heng Qi
On Wed, May 24, 2023 at 02:07:14AM -0400, Michael S. Tsirkin wrote:
> On Tue, May 23, 2023 at 09:51:44PM +0800, Heng Qi wrote:
> > On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote:
> > > > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote:
> > > > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > > > > > > > 1) Add a feature bit to the virtio specification to tell the 
> > > > > > > > sender that a fully
> > > > > > > > csumed packet must be sent.
> > > > > > > 
> > > > > > > Who is the sender in this picture? The driver?
> > > > > > 
> > > > > > The device or the driver.
> > > > > > 
> > > > > > When the device is hw, the sender is more likely to be a device.
> > > > > > When the device is sw, the sender can be a device or a driver.
> > > > > >
> > > > > > But in general, this feature is inclined to constrain the behavior 
> > > > > > of the device and
> > > > > > the driver from the receiving side.
> > > > > 
> > > > > Based on above I am guessing you are talking about driver getting
> > > > > packets from device, I wish you used terms from virtio spec.
> > > > 
> > > > Yes, I'm going to use the terminology of the virtio spec.
> > > > 
> > > > > 
> > > > > > For example: 
> > > > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device that 
> > > > > > you must send me a fully csumed packet.
> > > > > > 
> > > > > > Then the specific implementation can be
> > > > > > 
> > > > > > (1) the sender sends a fully csumed packet;
> > > > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the device 
> > > > > > helps calculate the fully csum
> > > > > > (because the two parties in the communication are located on 
> > > > > > the same host, the packet is trusted.).
> > > > > > 
> > > > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the 
> > > > > > driver will no longer receive any packets marked CHECKSUM_PARTIAL.
> > > > > > 
> > > > > > Thanks.
> > > > > 
> > > > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does.
> > > > 
> > > > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device can
> > > > receive a fully checksummed packet, we can no longer enjoy
> > > > the device's ability to validate the packet checksum. That is, the value
> > > > of \field{flags} in the virtio_net_hdr structure is set to 0, which 
> > > > means
> > > > that the packet received by the driver will not be marked as
> > > > VIRTIO_NET_HDR_F_DATA_VALID.
> > > > 
> > > > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM).
> > > > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must give the
> > > > driver a fully checksummed packet, and the packet is validated by the
> > > > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID.
> > > > 
> > > > > 
> > > > > I feel you are trying to say that clearing VIRTIO_NET_F_GUEST_CSUM
> > > > > disables all offloads but you want to keep some of them?
> > > > > 
> > > > 
> > > > No, what I mean is that a feature VIRTIO_NET_F_UNNECESSARY_CSUM is 
> > > > needed
> > > > in addition to VIRTIO_NET_F_GUEST_CSUM, if both features are negotiated,
> > > > then the driver may always receive packets marked as
> > > > VIRTIO_NET_HDR_F_DATA_VALID, which means that we can now load XDP at the
> > > > same time.
> > > 
> > > Makes no sense to me. VIRTIO_NET_F_GUEST_CSUM set already allows
> > > VIRTIO_NET_HDR_F_DATA_VALID:
> > 
> > We need to focus on what happens when the XDP program is loaded:
> > 
> > The current virtio-net needs to turn off the VIRTIO_NET_F_GUEST_CSUM
> > feature when loading XDP. The main reason for doing 

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [Proposal] Relationship between XDP and rx-csum in virtio-net

2023-05-30 Thread Heng Qi
On Tue, May 30, 2023 at 03:33:22PM -0400, Michael S. Tsirkin wrote:
> On Wed, May 24, 2023 at 04:12:46PM +0800, Heng Qi wrote:
> > On Wed, May 24, 2023 at 02:07:14AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 23, 2023 at 09:51:44PM +0800, Heng Qi wrote:
> > > > On Tue, May 23, 2023 at 09:30:28AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 23, 2023 at 05:18:20PM +0800, Heng Qi wrote:
> > > > > > On Tue, May 23, 2023 at 03:15:37AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, May 23, 2023 at 10:41:18AM +0800, Heng Qi wrote:
> > > > > > > > On Mon, May 22, 2023 at 03:10:05PM -0400, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Mon, May 22, 2023 at 08:12:00PM +0800, Heng Qi wrote:
> > > > > > > > > > 1) Add a feature bit to the virtio specification to tell 
> > > > > > > > > > the sender that a fully
> > > > > > > > > > csumed packet must be sent.
> > > > > > > > > 
> > > > > > > > > Who is the sender in this picture? The driver?
> > > > > > > > 
> > > > > > > > The device or the driver.
> > > > > > > > 
> > > > > > > > When the device is hw, the sender is more likely to be a device.
> > > > > > > > When the device is sw, the sender can be a device or a driver.
> > > > > > > >
> > > > > > > > But in general, this feature is inclined to constrain the 
> > > > > > > > behavior of the device and
> > > > > > > > the driver from the receiving side.
> > > > > > > 
> > > > > > > Based on above I am guessing you are talking about driver getting
> > > > > > > packets from device, I wish you used terms from virtio spec.
> > > > > > 
> > > > > > Yes, I'm going to use the terminology of the virtio spec.
> > > > > > 
> > > > > > > 
> > > > > > > > For example: 
> > > > > > > > VIRTIO_NET_F_UNNECESSARY_CSUM : The driver tells the device 
> > > > > > > > that you must send me a fully csumed packet.
> > > > > > > > 
> > > > > > > > Then the specific implementation can be
> > > > > > > > 
> > > > > > > > (1) the sender sends a fully csumed packet;
> > > > > > > > (2) the receiver receives a CHECKSUM_PARTIAL packet, and the 
> > > > > > > > device helps calculate the fully csum
> > > > > > > > (because the two parties in the communication are located 
> > > > > > > > on the same host, the packet is trusted.).
> > > > > > > > 
> > > > > > > > In summary, if VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the 
> > > > > > > > driver will no longer receive any packets marked 
> > > > > > > > CHECKSUM_PARTIAL.
> > > > > > > > 
> > > > > > > > Thanks.
> > > > > > > 
> > > > > > > This is what clearing VIRTIO_NET_F_GUEST_CSUM does.
> > > > > > 
> > > > > > Yes, but with VIRTIO_NET_F_GUEST_CSUM cleared, although the device 
> > > > > > can
> > > > > > receive a fully checksummed packet, we can no longer enjoy
> > > > > > the device's ability to validate the packet checksum. That is, the 
> > > > > > value
> > > > > > of \field{flags} in the virtio_net_hdr structure is set to 0, which 
> > > > > > means
> > > > > > that the packet received by the driver will not be marked as
> > > > > > VIRTIO_NET_HDR_F_DATA_VALID.
> > > > > > 
> > > > > > So, we need a feature bit (let's say VIRTIO_NET_F_UNNECESSARY_CSUM).
> > > > > > If VIRTIO_NET_F_UNNECESSARY_CSUM is negotiated, the device must 
> > > > > > give the
> > > > > > driver a fully checksummed packet, and the packet is validated by 
> > > > > > the
> > > > > > device with \field{flags} set to VIRTIO_NET_HDR_F_DATA_VALID.
> > > > > > 
> > > > > > > 
> > > > > > > I fee

Re: [virtio-dev] Re: [virtio-comment] [PATCH v17] virtio-net: support inner header hash

2023-06-20 Thread Heng Qi
On Tue, Jun 20, 2023 at 08:06:16AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 12, 2023 at 04:09:20PM +0800, Heng Qi wrote:
> > 1. Currently, a received encapsulated packet has an outer and an inner 
> > header, but
> > the virtio device is unable to calculate the hash for the inner header. The 
> > same
> > flow can traverse through different tunnels, resulting in the encapsulated
> > packets being spread across multiple receive queues (refer to the figure 
> > below).
> > However, in certain scenarios, we may need to direct these encapsulated 
> > packets of
> > the same flow to a single receive queue. This facilitates the processing
> > of the flow by the same CPU to improve performance (warm caches, less 
> > locking, etc.).
> > 
> >client1client2
> >   |+---+ |
> >   +--->|tunnels|<+
> >+---+
> >   |  |
> >   v  v
> >   +-+
> >   | monitoring host |
> >   +-+
> > 
> > To achieve this, the device can calculate a symmetric hash based on the 
> > inner headers
> > of the same flow.
> > 
> > 2. For legacy systems, they may lack entropy fields which modern protocols 
> > have in
> > the outer header, resulting in multiple flows with the same outer header but
> > different inner headers being directed to the same receive queue. This 
> > results in
> > poor receive performance.
> > 
> > To address this limitation, inner header hash can be used to enable the 
> > device to advertise
> > the capability to calculate the hash for the inner packet, regaining better 
> > receive performance.
> > 
> > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/173
> > 
> > Signed-off-by: Heng Qi 
> > Reviewed-by: Xuan Zhuo 
> > Reviewed-by: Parav Pandit 
> 
> 
> Looks good but small rewording suggestions:
> 
> > ---
> > v16->v17:
> > 1. Some small rewrites. @Parav Pandit
> > 2. Add Parav's Reviewed-by tag (Thanks!).
> > 
> > v15->v16:
> > 1. Remove the hash_option. In order to delimit the inner header hash 
> > and RSS
> >configuration, the ability to configure the outer src udp port hash 
> > is given
> >to RSS. This is orthogonal to inner header hash, which will be done 
> > in the
> >RSS capability extension topic (considered as an RSS extension 
> > together
> >with the symmetric toeplitz hash algorithm, etc.). @Parav Pandit 
> > @Michael S . Tsirkin
> 
> Hmm maybe. I'd like the TC see this patch though, they are designed to
> work together. Can we see at least a draft? I have been burned

I've drawn up a simple draft on hashing based on outer src port,
but not yet on toeplitz symmetric hashing, I'll send it out in about 4
weeks, but please don't delay this work. You have my word.

> many times by contributors just addressing what they
> care about, promising to follow through with other work and
> never bothering.

I got it.

> 
> > 2. Fix a 'field' typo. @Parav Pandit
> > 
> > v14->v15:
> > 1. Add tunnel hash option suggested by @Michael S . Tsirkin
> > 2. Adjust some descriptions.
> > 
> > v13->v14:
> > 1. Move supported_hash_tunnel_types from config space into cvq command. 
> > @Parav Pandit
> > 2. Rebase to master branch.
> > 3. Some minor modifications.
> > 
> > v12->v13:
> > 1. Add a GET command for hash_tunnel_types. @Parav Pandit
> > 2. Add tunneling protocol explanation. @Jason Wang
> > 3. Add comments on some usage scenarios for inner hash.
> > 
> > v11->v12:
> > 1. Add a command VIRTIO_NET_CTRL_MQ_TUNNEL_CONFIG.
> > 2. Refine the commit log. @Michael S . Tsirkin
> > 3. Add some tunnel types.
> > 
> > v10->v11:
> > 1. Revise commit log for clarity for readers.
> > 2. Some modifications to avoid undefined terms. @Parav Pandit
> > 3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
> > 4. Add the normative statements. @Parav Pandit
> > 
> > v9->v10:
> > 1. Removed hash_report_tunnel related information. @Parav Pandit
> > 2. Re-describe the limitations of QoS for tunneling.
> > 3. Some clarification.
> > 
> > v8->v9:
> > 1. Merge hash_re

Re: [virtio-dev] RE: [virtio-comment] Re: [PATCH v18] virtio-net: support inner header hash

2023-06-21 Thread Heng Qi




在 2023/6/22 上午4:52, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, June 21, 2023 4:38 PM

And the field is RO so no memory cost to exposing it in all VFs.

Two structures do not bring the asymmetry.
Accessing current and enabled fields via two different mechanism is bringing

the asymmetry.

I guess it's a matter of taste, but it is clearly more consistent with other 
hash
things, to which it's very similar.


This is consistent with new commands we define including notification 
coalescing whose GET is not coming config space.


Yes.

  

Nah, config space is too convenient when we can live with its limitations. I 
don't
thin kwe prefer not to keep growing it.
For some things such as this one it's perfect.


Fields are different between different devices.


For example, for migration driver might want to validate that two devices have
same capability. doing it without dma is nicer.


A migration driver for real world scenario, will almost have to use the dma for 
amount of data it needs to exchange.


Another example, future admin transport will have ability to provision devices
by supplying their config space.
This will include this capability automatically, if instead we hide it in a 
command
we need to do extra custom work.


So we do not prefer to keep growing the config space anymore, hence
GET is the right approach to me.

Heh I know you hate config space. Let it go, stop wasting time arguing about the
same thing on every turn and instead help define admin transport to solve it

This was discussed many times, a driver to have a direct (non-intercepted by 
owner device) channel to device.
If you mean this non-intercepted channel as admin transport, fine.
If you mean this is intercepted and it is going over admin cmd, then it is of 
no use for all future interfaces.

We discussed this in thread with you and Jason.
I provided concrete example with size and device provisioning math too and 
other example of multi-physical address VQ.
So transporting register by register over some admin transport is sub-optimal.


Parav, your implementation prefers two separate struct versions and 
doesn't let supported_hash_tunnel_types expand in configuration space. I 
remember this.
I agree that we don't want to jump back and forth, especially as there 
are practical reasons and 5 version jumps to get 
supported_hash_tunnel_types back into the config space.


The original intention of Michael's proposal to merge structures in v18 
should be that two separate structures will cause asynchrony.
I don't think so, the driver can cache enabled hash_tunnel_types every 
SET command. Or after the SET command the driver *SHOULD* use the GET 
command again, which is the workaround.


Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v18] virtio-net: support inner header hash

2023-06-22 Thread Heng Qi




在 2023/6/22 下午8:32, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Thursday, June 22, 2023 2:23 AM

On Wed, Jun 21, 2023 at 08:52:04PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, June 21, 2023 4:38 PM

And the field is RO so no memory cost to exposing it in all VFs.

Two structures do not bring the asymmetry.
Accessing current and enabled fields via two different mechanism
is bringing

the asymmetry.

I guess it's a matter of taste, but it is clearly more consistent
with other hash things, to which it's very similar.


This is consistent with new commands we define including notification

coalescing whose GET is not coming config space.

But there GET just reports the current state. Not the read only capability. So
there would be cost per VF to keep it in config space.
This one is RO no cost per VF. Let's make it convenient?


And each VF can have different value hence requires per VF storage in the 
device.
  

Nah, config space is too convenient when we can live with its
limitations. I don't thin kwe prefer not to keep growing it.
For some things such as this one it's perfect.


Fields are different between different devices.

Not sure what's the implication?

Implication is device needs to store this in always available on-chip memory 
which is not good.


For example, for migration driver might want to validate that two
devices have same capability. doing it without dma is nicer.


A migration driver for real world scenario, will almost have to use the dma for

amount of data it needs to exchange.

Not migration itself, provisioning.


Provisioning driver usually do not attach to the member device directly.
This requires device reset, followed by reaching _DRIVER stage, querying 
features etc and config area.
And unbinding it and second reset by member driver. Ugh.

Provisioning driver also needs to get the state or capabilities even when 
member driver is already attached.
So config space is not much a gain either.
  

Another example, future admin transport will have ability to
provision devices by supplying their config space.
This will include this capability automatically, if instead we hide
it in a command we need to do extra custom work.


So we do not prefer to keep growing the config space anymore,
hence GET is the right approach to me.

Heh I know you hate config space. Let it go, stop wasting time
arguing about the same thing on every turn and instead help define
admin transport to solve it

This was discussed many times, a driver to have a direct (non-intercepted by

owner device) channel to device.

If you mean this non-intercepted channel as admin transport, fine.

we can do that, sure.


If you mean this is intercepted and it is going over admin cmd, then it is of no

use for all future interfaces.

We discussed this in thread with you and Jason.
I provided concrete example with size and device provisioning math too and

other example of multi-physical address VQ.

So transporting register by register over some admin transport is sub-optimal.


Not register by register, we can send all of config space as long as it's RO. 
This
field is.


It is RO in context of one member device, but every VF can have different value.
The device will never know if one will use new cmdvq to access or some old 
driver will use without it.
And hence, it always needs to provision it on onchip memory for backward 
compatibility.


Yes, I think we also have to consider upcoming
    1. device counters (e.g. supported_device_counter),
    2. receive flow filters (e.g. supported_flow_types, 
supported_max_entries),

    3. header splits (e.g. supported_split_types) etc.
Continuous expansion of the configuration space needs to be careful.



Instead of decision point being RO vs RW,
any new fields via cmdvq and existing fields stays in cfg space, give 
predictable behavior to size the member devices in the system.
Once the cmdvq is available, we can get rid of GET command used in this version 
for new future features.
Till that arrives, GET command is the efficient way.


Yes, I agree.

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Heng Qi




在 2023/6/29 上午9:56, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Wednesday, June 28, 2023 3:45 PM

Maybe I get it. You want to use the new features as a carrot to
force drivers to implement DMA? You suspect they will ignore the
spec requirement just because things seem to work?


Right because it is not a must normative.

Well SHOULD also does not mean "ok to just ignore".

This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.


RECOMMENDED and SHOULD forces the device to support MMIO, which is not good.
So rather a good design is device tells the starting offset for the extended 
config space.
And extended config space MUST be accessed using a DMA.
With this sw can have infinite size MMIO and hw device forces DMA based on its 
implementation of where to start DMA from.
This also gives the ability to maintain current config as MMIO for backward 
compatibility.
  



There's some logic here, for sure. you just might be right.

However, surely we can discuss this small tweak in 1.4 timeframe?

Sure, if we prefer the DMA approach I don't have a problem in adding

temporary one field to config space.

I propose to add a line to the spec " Device Configuration Space"
section, something like,

Note: Any new device configuration space fields additional MUST consider

accessing such fields via a DMA interface.

And this will guide the new patches of what to do instead of last moment

rush.

Yea, except again I'd probably make it a SHOULD: e.g. I can see how switching to
MMIO might be an option for qemu helping us debug DMA issues.


There are too many queues whose debugging is needed and MMIO likely not the way 
to debug.
  

The time to discuss this detail would be around when proposal for the DMA
access to config space is on list though: I feel this SHOULD vs MUST is a small
enough detail.


 From implementation POV it is certainly critical and good step forward to 
optimize virtio interface.
  

Going back to inner hash. If we move supported_tunnels back to config space,
do you feel we still need GET or just drop it? I note we do not have GET for
either hash or rss config.


For hash and rss config, debugging is missing. :)
Yes, we can drop the GET after switching supported_tunnels to struct 
virtio_net_hash_config.


Great! Glad to hear this!

  

And if we no longer have GET is there still a reason for a separate command as
opposed to a field in virtio_net_hash_config?
I know this was done in v11 but there it was misaligned.
We went with a command because we needed it for supported_tunnels but
now that is no longer the case and there are reserved words in
virtio_net_hash_config ...

Let me know how you feel it about that, not critical for me.

struct virtio_net_hash_config reserved is fine.


+1.

Inner header hash is orthogonal to RSS, and it's fine to have its own 
structure and commands.
There is no need to send additional RSS fields when we configure inner 
header hash.


Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-28 Thread Heng Qi
On Thu, Jun 29, 2023 at 01:56:34AM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Wednesday, June 28, 2023 3:45 PM
> > > > Maybe I get it. You want to use the new features as a carrot to
> > > > force drivers to implement DMA? You suspect they will ignore the
> > > > spec requirement just because things seem to work?
> > > >
> > > Right because it is not a must normative.
> > 
> > Well SHOULD also does not mean "ok to just ignore".
> > 
> > This word, or the adjective "RECOMMENDED", mean that there
> >may exist valid reasons in particular circumstances to ignore a
> >particular item, but the full implications must be understood and
> >carefully weighed before choosing a different course.
> >
> RECOMMENDED and SHOULD forces the device to support MMIO, which is not good.
> So rather a good design is device tells the starting offset for the extended 
> config space.
> And extended config space MUST be accessed using a DMA.
> With this sw can have infinite size MMIO and hw device forces DMA based on 
> its implementation of where to start DMA from.
> This also gives the ability to maintain current config as MMIO for backward 
> compatibility.
>  
> > 
> > 
> > > > There's some logic here, for sure. you just might be right.
> > > >
> > > > However, surely we can discuss this small tweak in 1.4 timeframe?
> > >
> > > Sure, if we prefer the DMA approach I don't have a problem in adding
> > temporary one field to config space.
> > >
> > > I propose to add a line to the spec " Device Configuration Space"
> > > section, something like,
> > >
> > > Note: Any new device configuration space fields additional MUST consider
> > accessing such fields via a DMA interface.
> > >
> > > And this will guide the new patches of what to do instead of last moment
> > rush.
> > 
> > Yea, except again I'd probably make it a SHOULD: e.g. I can see how 
> > switching to
> > MMIO might be an option for qemu helping us debug DMA issues.
> >
> There are too many queues whose debugging is needed and MMIO likely not the 
> way to debug.
>  
> > The time to discuss this detail would be around when proposal for the DMA
> > access to config space is on list though: I feel this SHOULD vs MUST is a 
> > small
> > enough detail.
> >
> From implementation POV it is certainly critical and good step forward to 
> optimize virtio interface.
>  
> > Going back to inner hash. If we move supported_tunnels back to config space,
> > do you feel we still need GET or just drop it? I note we do not have GET for
> > either hash or rss config.
> >
> For hash and rss config, debugging is missing. :)
> Yes, we can drop the GET after switching supported_tunnels to struct 
> virtio_net_hash_config.
>  

I would like to make sure if we're aligned. The new version should contain the 
following:
1. The supported_tunnel_types are placed in the device config space;
2. Reserve the following structure:

 struct virtnet_hash_tunnel {
le32 enabled_tunnel_types;
 };

3. Reserve the SET command for enabled_tunnel_types and remove the GET command 
for enabled_tunnel_types.

If there is no problem, I will modify it accordingly.

Thanks!

> > And if we no longer have GET is there still a reason for a separate command 
> > as
> > opposed to a field in virtio_net_hash_config?
> > I know this was done in v11 but there it was misaligned.
> > We went with a command because we needed it for supported_tunnels but
> > now that is no longer the case and there are reserved words in
> > virtio_net_hash_config ...
> > 
> > Let me know how you feel it about that, not critical for me.
> 
> struct virtio_net_hash_config reserved is fine.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-29 Thread Heng Qi




在 2023/6/29 下午7:48, Michael S. Tsirkin 写道:

On Thu, Jun 29, 2023 at 10:05:09AM +0800, Heng Qi wrote:


在 2023/6/29 上午9:56, Parav Pandit 写道:

From: Michael S. Tsirkin 
Sent: Wednesday, June 28, 2023 3:45 PM

Maybe I get it. You want to use the new features as a carrot to
force drivers to implement DMA? You suspect they will ignore the
spec requirement just because things seem to work?


Right because it is not a must normative.

Well SHOULD also does not mean "ok to just ignore".

This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.


RECOMMENDED and SHOULD forces the device to support MMIO, which is not good.
So rather a good design is device tells the starting offset for the extended 
config space.
And extended config space MUST be accessed using a DMA.
With this sw can have infinite size MMIO and hw device forces DMA based on its 
implementation of where to start DMA from.
This also gives the ability to maintain current config as MMIO for backward 
compatibility.

There's some logic here, for sure. you just might be right.

However, surely we can discuss this small tweak in 1.4 timeframe?

Sure, if we prefer the DMA approach I don't have a problem in adding

temporary one field to config space.

I propose to add a line to the spec " Device Configuration Space"
section, something like,

Note: Any new device configuration space fields additional MUST consider

accessing such fields via a DMA interface.

And this will guide the new patches of what to do instead of last moment

rush.

Yea, except again I'd probably make it a SHOULD: e.g. I can see how switching to
MMIO might be an option for qemu helping us debug DMA issues.


There are too many queues whose debugging is needed and MMIO likely not the way 
to debug.

The time to discuss this detail would be around when proposal for the DMA
access to config space is on list though: I feel this SHOULD vs MUST is a small
enough detail.


  From implementation POV it is certainly critical and good step forward to 
optimize virtio interface.

Going back to inner hash. If we move supported_tunnels back to config space,
do you feel we still need GET or just drop it? I note we do not have GET for
either hash or rss config.


For hash and rss config, debugging is missing. :)
Yes, we can drop the GET after switching supported_tunnels to struct 
virtio_net_hash_config.

Great! Glad to hear this!


And if we no longer have GET is there still a reason for a separate command as
opposed to a field in virtio_net_hash_config?
I know this was done in v11 but there it was misaligned.
We went with a command because we needed it for supported_tunnels but
now that is no longer the case and there are reserved words in
virtio_net_hash_config ...

Let me know how you feel it about that, not critical for me.

struct virtio_net_hash_config reserved is fine.

+1.

Inner header hash is orthogonal to RSS, and it's fine to have its own
structure and commands.
There is no need to send additional RSS fields when we configure inner
header hash.

Thanks.

Not RSS, hash calculations. It's not critical, but I note that
practically you said you will enable this with symmetric hash
so it makes sense to me to send this in the same command


This works for me.

Thanks.


with the key.

Not critical though if there's opposition.




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v19] virtio-net: support inner header hash

2023-06-29 Thread Heng Qi
On Thu, Jun 29, 2023 at 04:59:28PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Thursday, June 29, 2023 7:48 AM
> 
> 
> > > > struct virtio_net_hash_config reserved is fine.
> > >
> > > +1.
> > >
> > > Inner header hash is orthogonal to RSS, and it's fine to have its own
> > > structure and commands.
> > > There is no need to send additional RSS fields when we configure inner
> > > header hash.
> > >
> > > Thanks.
> > 
> > Not RSS, hash calculations. It's not critical, but I note that practically 
> > you said
> > you will enable this with symmetric hash so it makes sense to me to send 
> > this in
> > the same command with the key.
> > 
> 
> In the v19, we have,
> 
> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with 
> VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
> 
> So it is done along with rss, so in same struct as rss config is fine.

Do you mean having both virtio_net_rss_config and virtio_net_hash_config
have enabled_hash_types?
Like this:

struct virtio_net_rss_config {
 le32 hash_types;
 le16 indirection_table_mask;
 struct rss_rq_id unclassified_queue;
 struct rss_rq_id indirection_table[indirection_table_length];
 le16 max_tx_vq;
 u8 hash_key_length;
 u8 hash_key_data[hash_key_length];
+le32 enabled_tunnel_types; 
};

struct virtio_net_hash_config {
 le32 hash_types;
-le16 reserved[4];
+le32 enabled_tunnel_types;
+le16 reserved[2];
 u8 hash_key_length;
 u8 hash_key_data[hash_key_length];
};


If yes, this should have been discussed in v10 [1] before, enabled_tunnel_types 
in virtio_net_rss_config
will follow the variable length field and cause misalignment.

If we let the inner header hash reuse the virtio_net_hash_config structure, it 
can work, but the only disadvantage
is that the configuration of the inner header hash and *RSS*(not hash 
calculations) becomes somewhat coupled.
Just imagine:
If the driver and the device negotiated VIRTIO_NET_F_HASH_TUNNEL and 
VIRTIO_NET_F_RSS, but did not negotiate VIRTIO_NET_F_HASH_REPORT,
1. then if we only want to configure the inner header hash (such as 
enabled_tunnel_types), it is good for us to send
virtio_net_hash_config alone;
2. but then if we want to configure the inner header hash and RSS (such as 
indirection table), we need to send all
virtio_net_rss_config and virtio_net_hash_config once, because 
virtio_net_rss_config now does not carry enabled_tunnel_types
due to misalignment.

So, I think the following structure will make it clearer to configure inner 
header hash and RSS/hash calculation.
But in any case, if we still propose to reuse virtio_net_hash_config proposal, 
I am ok, no objection:

1. The supported_tunnel_types are placed in the device config space;

2.
Reserve the following structure:

  struct virtnet_hash_tunnel {
le32 enabled_tunnel_types;
  };

3. Reserve the SET command for enabled_tunnel_types and remove the GET
command for enabled_tunnel_types.

[1] https://lists.oasis-open.org/archives/virtio-dev/202303/msg00317.html

Thanks a lot!

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH 0/4] Short document fixes to inner hash feature

2023-07-13 Thread Heng Qi




在 2023/7/13 下午7:33, Michael S. Tsirkin 写道:

On Thu, Jul 13, 2023 at 11:12:13AM +0200, Cornelia Huck wrote:

On Thu, Jul 13 2023, Parav Pandit  wrote:


This short patches fixes the editing errors that stops the pdf and html
generation.

These 3 fixes are for the patch [1] for the github issue [2].

[1] https://lists.oasis-open.org/archives/virtio-comment/202307/msg00024.html
[2] https://github.com/oasis-tcs/virtio-spec/issues/173

Patch summary:
patch-1 place C code under listing
patch-2 avoid hyphen and extra braces
patch-3 use table as hyperlink do not work well in C code listing
patch-4 refer 'advice' as 'note'

Patch 1 to 3 appears to be must in the testing.
Patch 4 is not a fix and can be done later if it requires discussion.

Parav Pandit (4):
   virtio-net: Place C code under listing
   virtio-net: Avoid hyphen and extra braces
   virtio-net: Use table to describe inner hash to rfc mapping
   virtio-net: Use note instead of advice

  device-types/net/description.tex | 45 ++--
  introduction.tex | 15 +--
  2 files changed, 38 insertions(+), 22 deletions(-)


FTR, this is the diff I have locally (I had missed one underscore in the
references yesterday...); maybe we can make the intra-reference links in
introdcution.tex a bit nicer, but otherwise, this should be the minimal
change to make this build:

Perfect. Seems like clearly an editorial fix.

Heng Qi, in the future I'd like to ask you to please build the
spec and review the resulting PDF and HTML, before posting.


Yes, I will!






diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 206020de567d..76585b0dd6d3 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1024,12 +1024,14 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  If VIRTIO_NET_F_HASH_TUNNEL has been negotiated, the driver can send the 
command
  VIRTIO_NET_CTRL_HASH_TUNNEL_SET to configure the calculation of the inner 
header hash.
  
+\begin{lstlisting}

  struct virtnet_hash_tunnel {
  le32 enabled_tunnel_types;
  };
  
  #define VIRTIO_NET_CTRL_HASH_TUNNEL 7

   #define VIRTIO_NET_CTRL_HASH_TUNNEL_SET 0
+\end{lstlisting}
  
  Field \field{enabled_tunnel_types} contains the bitmask of encapsulation types enabled for inner header hash.

  See \ref{sec:Device Types / Network Device / Device Operation / Processing of 
Incoming Packets /
@@ -1063,16 +1065,16 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  Hash calculation for incoming packets / Encapsulation types supported/enabled 
for inner header hash}
  
  Encapsulation types applicable for inner header hash:

-\begin{lstlisting}
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2784(1 << 0) /* 
\hyperref[intro:gre_rfc2784]{[GRE_rfc2784]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2890(1 << 1) /* 
\hyperref[intro:gre_rfc2890]{[GRE_rfc2890]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_7676(1 << 2) /* 
\hyperref[intro:gre_rfc7676]{[GRE_rfc7676]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_UDP (1 << 3) /* 
\hyperref[intro:gre_in_udp_rfc8086]{[GRE-in-UDP]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN   (1 << 4) /* 
\hyperref[intro:vxlan]{[VXLAN]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN_GPE   (1 << 5) /* 
\hyperref[intro:vxlan_gpe]{[VXLAN-GPE]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE  (1 << 6) /* 
\hyperref[intro:geneve]{[GENEVE]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP(1 << 7) /* 
\hyperref[intro:ipip]{[IPIP]} */
-#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE   (1 << 8) /* 
\hyperref[intro:nvgre]{[NVGRE]} */
+\begin{lstlisting}[escapechar=|]
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2784(1 << 0) /* 
|\hyperref[intro:rfc2784]{[RFC2784]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_2890(1 << 1) /* 
|\hyperref[intro:rfc2890]{[RFC2890]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_7676(1 << 2) /* 
|\hyperref[intro:rfc7676]{[RFC7676]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE_UDP (1 << 3) /* 
|\hyperref[intro:rfc8086]{[GRE-in-UDP]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN   (1 << 4) /* 
|\hyperref[intro:vxlan]{[VXLAN]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN_GPE   (1 << 5) /* 
|\hyperref[intro:vxlan-gpe]{[VXLAN-GPE]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE  (1 << 6) /* 
|\hyperref[intro:geneve]{[GENEVE]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP(1 << 7) /* 
|\hyperref[intro:ipip]{[IPIP]}| */
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE   (1 << 8) /* 
|\hyperref[intro:nvgre]{[NVGRE]}| */
  \end{lstlisting}
  
  \subparagraph{Advice}

diff --git a/introduction.tex b/introduction.tex
index 81f07a4fee19..6f10a94b6fde 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -102,18 +102,18 @@ \section{N

[virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-15 Thread Heng Qi

Hi all!

I would like to ask if anyone has any comments on this version, if so 
please let me know!
If not, I will collect Michael's comments and publish a new version next 
Monday.


Since Christmas is coming, I think this feature may be in danger of 
following the pace of
our hw version releases, so I sincerely request that you please review 
it as soon as possible.


Thanks!

在 2023/12/12 下午5:30, Heng Qi 写道:



在 2023/12/12 下午5:23, Heng Qi 写道:



在 2023/12/12 下午4:44, Michael S. Tsirkin 写道:

On Tue, Dec 12, 2023 at 11:28:21AM +0800, Heng Qi wrote:


在 2023/12/12 上午12:35, Michael S. Tsirkin 写道:

On Mon, Dec 11, 2023 at 05:11:59PM +0800, Heng Qi wrote:
virtio-net works in a virtualized system and is somewhat 
different from

physical nics. One of the differences is that to save virtio device
resources, rx may receive partially checksummed packets. However, 
XDP may

cause partially checksummed packets to be dropped.
So XDP loading currently conflicts with the feature 
VIRTIO_NET_F_GUEST_CSUM.


This patch lets the device to supply fully checksummed packets to 
the driver.
Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM to enjoy the 
benefits of

device validation checksum.

In addition, implementation of some performant devices always do 
not generate
partially checksummed packets, but the standard driver still need 
to clear

VIRTIO_NET_F_GUEST_CSUM when XDP is there.
A new feature VIRTIO_NET_F_GUEST_FULLY_CSUM is added to solve the 
above

situation, which provides the driver with configurable offload.
If the offload is enabled, then the device must deliver fully
checksummed packets to the driver and may validate the checksum.

Use case example:
If VIRTIO_NET_F_GUEST_FULLY_CSUM is negotiated and the offload is 
enabled,
after XDP processes a fully checksummed packet, the 
VIRTIO_NET_HDR_F_DATA_VALID bit
is retained if the device has validated its checksum, resulting 
in the guest
not needing to validate the checksum again. This is useful for 
guests:

    1. Bring the driver advantages such as cpu savings.
    2. For devices that do not generate partially checksummed 
packets themselves,
   XDP can be loaded in the driver without modifying the 
hardware behavior.


Several solutions have been discussed in the previous proposal[1].
After historical discussion, we have tried the method proposed by 
Jason[2],
but some complex scenarios and challenges are difficult to deal 
with.

We now return to the method suggested in [1].

[1] 
https://lists.oasis-open.org/archives/virtio-dev/202305/msg00291.html 

[2] 
https://lore.kernel.org/all/20230628030506.2213-1-hen...@linux.alibaba.com/


Signed-off-by: Heng Qi 
Reviewed-by: Xuan Zhuo 
---
v4->v5:
- Remove the modification to the GUEST_CSUM.
- The description of this feature has been reorganized for 
greater clarity.


v3->v4:
- Streamline some repetitive descriptions. @Jason
- Add how features should work, when to be enabled, and overhead. 
@Jason @Michael


v2->v3:
- Add a section named "Driver Handles Fully Checksummed Packets"
    and more descriptions. @Michael

v1->v2:
- Modify full checksum functionality as a configurable offload
    that is initially turned off. @Jason

   device-types/net/description.tex    | 74 
+++--

   device-types/net/device-conformance.tex |  1 +
   device-types/net/driver-conformance.tex |  1 +
   introduction.tex    |  3 +
   4 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/device-types/net/description.tex 
b/device-types/net/description.tex

index aff5e08..ab6c13d 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -122,6 +122,9 @@ \subsection{Feature bits}\label{sec:Device 
Types / Network Device / Feature bits

   device with the same MAC address.
   \item[VIRTIO_NET_F_SPEED_DUPLEX(63)] Device reports speed and 
duplex.

+
+\item[VIRTIO_NET_F_GUEST_FULLY_CSUM (64)] Device delivers fully 
checksummed packets

+    to the driver and may validate the checksum.
   \end{description}

I propose
VIRTIO_NET_F_GUEST_CSUM_COMPLETE
instead.

Can I ask here if *complete* in VIRTIO_NET_F_GUEST_CSUM_COMPLETE and
CHECKSUM_COMPLETE mean the same thing?

If so, it seems that it's no longer the same as the description of 
this

patch.

Oh. I thought it is. Then I guess I misunderstand what this patch is
supposed to be doing, again.


Here's some context:

From the perspective of the Linux kernel, the GUEST_CSUM feature is 
negotiated to support
(1)  CHECKSUM_NONE, (2) CHECKSUM_UNNECESSARY, (3) CHECKSUM_PARTIAL, 
which
respectively correspond to (1) the device does not validate the 
packet checksum (may not have
the ability to validate some protocols or does not recognize the 
packet); (2) the device has verified
the data packet, then sets DATA_VALID bit in flags; (3) In order to 
save device resources, VMs
on the same host deliver partially checksummed packets, and 
NEEDS_CSUM bit is set in flags.

[virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-17 Thread Heng Qi




在 2023/12/18 上午11:10, Jason Wang 写道:

On Fri, Dec 15, 2023 at 5:51 PM Heng Qi  wrote:

Hi all!

I would like to ask if anyone has any comments on this version, if so
please let me know!
If not, I will collect Michael's comments and publish a new version next
Monday.

I have a dumb question. (And sorry if I asked it before)

Looking at the spec and code. It looks to me DATA_VALID could be set
without GUEST_CSUM.


I don't see that in the spec.
Am I missing something? [1][2]

[1] If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the 
VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has 
validated the packet checksum. In case of multiple encapsulated 
protocols, one level of checksums has been validated.
Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN features 
*enable receive checksum*, large receive offload and ECN support which 
are the input equivalents of the transmit checksum, transmit 
segmentation *offloading* and ECN features, as described in 5.1.6.2.


[2] If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device *MUST set 
flags to zero* and SHOULD supply a fully checksummed packet to the driver.


I think the reason why the feature bit is not checked in the code is 
because the check is omitted because it is on a per-packet basis,
just like the reason why supported_valid_types is not needed as 
discussed in the v4 version threads. It is not unnecessary.


Thanks!



If yes, why do we need to bother here? If we disable GUEST_CSUM, the
packet will contain checksum. And if the device sets DATA_VALID, it
means the checksum is validated.

Thanks




Since Christmas is coming, I think this feature may be in danger of
following the pace of
our hw version releases, so I sincerely request that you please review
it as soon as possible.

Thanks!

在 2023/12/12 下午5:30, Heng Qi 写道:


在 2023/12/12 下午5:23, Heng Qi 写道:


在 2023/12/12 下午4:44, Michael S. Tsirkin 写道:

On Tue, Dec 12, 2023 at 11:28:21AM +0800, Heng Qi wrote:

在 2023/12/12 上午12:35, Michael S. Tsirkin 写道:

On Mon, Dec 11, 2023 at 05:11:59PM +0800, Heng Qi wrote:

virtio-net works in a virtualized system and is somewhat
different from
physical nics. One of the differences is that to save virtio device
resources, rx may receive partially checksummed packets. However,
XDP may
cause partially checksummed packets to be dropped.
So XDP loading currently conflicts with the feature
VIRTIO_NET_F_GUEST_CSUM.

This patch lets the device to supply fully checksummed packets to
the driver.
Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM to enjoy the
benefits of
device validation checksum.

In addition, implementation of some performant devices always do
not generate
partially checksummed packets, but the standard driver still need
to clear
VIRTIO_NET_F_GUEST_CSUM when XDP is there.
A new feature VIRTIO_NET_F_GUEST_FULLY_CSUM is added to solve the
above
situation, which provides the driver with configurable offload.
If the offload is enabled, then the device must deliver fully
checksummed packets to the driver and may validate the checksum.

Use case example:
If VIRTIO_NET_F_GUEST_FULLY_CSUM is negotiated and the offload is
enabled,
after XDP processes a fully checksummed packet, the
VIRTIO_NET_HDR_F_DATA_VALID bit
is retained if the device has validated its checksum, resulting
in the guest
not needing to validate the checksum again. This is useful for
guests:
 1. Bring the driver advantages such as cpu savings.
 2. For devices that do not generate partially checksummed
packets themselves,
XDP can be loaded in the driver without modifying the
hardware behavior.

Several solutions have been discussed in the previous proposal[1].
After historical discussion, we have tried the method proposed by
Jason[2],
but some complex scenarios and challenges are difficult to deal
with.
We now return to the method suggested in [1].

[1]
https://lists.oasis-open.org/archives/virtio-dev/202305/msg00291.html

[2]
https://lore.kernel.org/all/20230628030506.2213-1-hen...@linux.alibaba.com/

Signed-off-by: Heng Qi 
Reviewed-by: Xuan Zhuo 
---
v4->v5:
- Remove the modification to the GUEST_CSUM.
- The description of this feature has been reorganized for
greater clarity.

v3->v4:
- Streamline some repetitive descriptions. @Jason
- Add how features should work, when to be enabled, and overhead.
@Jason @Michael

v2->v3:
- Add a section named "Driver Handles Fully Checksummed Packets"
 and more descriptions. @Michael

v1->v2:
- Modify full checksum functionality as a configurable offload
 that is initially turned off. @Jason

device-types/net/description.tex| 74
+++--
device-types/net/device-conformance.tex |  1 +
device-types/net/driver-conformance.tex |  1 +
introduction.tex|  3 +
4 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/device-types/net/description.tex
b/device-types/net/description.tex
index aff5

Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-19 Thread Heng Qi




在 2023/12/19 下午3:53, Jason Wang 写道:

On Mon, Dec 18, 2023 at 12:54 PM Heng Qi  wrote:



在 2023/12/18 上午11:10, Jason Wang 写道:

On Fri, Dec 15, 2023 at 5:51 PM Heng Qi  wrote:

Hi all!

I would like to ask if anyone has any comments on this version, if so
please let me know!
If not, I will collect Michael's comments and publish a new version next
Monday.

I have a dumb question. (And sorry if I asked it before)

Looking at the spec and code. It looks to me DATA_VALID could be set
without GUEST_CSUM.

I don't see that in the spec.
Am I missing something? [1][2]

[1] If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has
validated the packet checksum. In case of multiple encapsulated
protocols, one level of checksums has been validated.
Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN features
*enable receive checksum*, large receive offload and ECN support which
are the input equivalents of the transmit checksum, transmit
segmentation *offloading* and ECN features, as described in 5.1.6.2.

[2] If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device *MUST set
flags to zero* and SHOULD supply a fully checksummed packet to the driver.

So this is kind of ambiguous and seems not what I wanted when I wrote
the code for DATA_VALID in 2011.


Hi Jason, please see below.



NEEDS_CSUM maps to CHECKSUM_PARTIAL which means the packet checksum is
correct.


Yes. This mapping is because the PARTIAL checksum usually does not go 
through the physical wire,

so it is considered safe, and the checksum does not need to be verified.


So spec had

"""
If neither VIRTIO_NET_HDR_F_NEEDS_CSUM nor VIRTIO_NET_HDR_F_DATA_VALID
is set, the driver MUST NOT rely on the packet checksum being correct.
"""


Yes. The checksum of a packet without NEEDS_CSUM or has not been 
verified (DATA_VALID set) is unreliable.

This patch doesn't break that.



For DATA_VALID, it maps to CHECKSUM_UNNECESSARY which is mutually
exclusive with CHECKSUM_PARTAIL.


Yes. Both cannot be set or appear at the same time.


And this is what Linux did right now:

For tun_put_user():

 if (skb->ip_summed == CHECKSUM_PARTIAL) {
 ...
 } else if (has_data_valid &&
skb->ip_summed == CHECKSUM_UNNECESSARY) {
hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
 } /* else everything is zero */

This CHECKSUM_UNNECESSARY will work even if GUEST_CSUM is disabled if
I was not wrong.


I think you are talking about this commit: 
10a8d94a95742bb15b4e617ee9884bb4381362be


But in fact, as your commit log says, I think this is a hack. Host nics 
does not fall into the scope of virtio spec?





And in receive_buf():

 if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
 skb->ip_summed = CHECKSUM_UNNECESSARY;

I think we can fix this by safely removing "*MUST set flags to zero*"
in [2] from the spec.


Sorry. I cannot follow this view.

1. First of all, VIRTIO_NET_F_GUEST_CSUM (partial csum is not considered 
now, because we have no dispute about it) does represent the device's 
ability to calculate and verify checksums.
Its ability to handle partial checksums (NEEDS_CSUM) is just a special 
processing of virtio, the Linux kernel never had a netdev feature for 
partial checksum handling.


  1.1 VIRTIO_NET_F_GUEST_{TSO4, TSO6, USO4} etc. depend on 
VIRTIO_NET_F_GUEST_CSUM.
    The reason for being relied upon is not that they are related 
to NEEDS_CSUM, but that the device needs to recalculate and verify the 
checksum of the packets when merging the packets.

    See netdev_fix_features:
   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
 dev->features |= NETIF_F_RXCSUM;
  - netdev_fix_features ->
   if (!(features & NETIF_F_RXCSUM)) {
 /* NETIF_F_GRO_HW implies doing RXCSUM since every packet
  * successfully merged by hardware must also have the
  * checksum verified by hardware. If the user does not
  * want to enable RXCSUM, logically, we should disable 
GRO_HW.

  */
 if (features & NETIF_F_GRO_HW) {
 netdev_dbg(dev, "Dropping NETIF_F_GRO_HW since 
no RXCSUM feature.\n");

 features &= ~NETIF_F_GRO_HW;
 }
 }

  1.2 See NETIF_F_RXCSUM_BIT    /* Receive checksumming offload */
 Most device drivers use NETIF_RX_CSUM to indicate device checksum 
capabilities,
 and the corresponding offload can be dynamically switched on and 
off by user tools such as ethtool.


2. The implementation of vhost-user, large-scale commercial virtio 
device that I know of, and other devices are
completely designed and implemented in accordance with virtio 1.0 and 
later. They are comply with 

Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-19 Thread Heng Qi




在 2023/12/20 下午1:48, Jason Wang 写道:

On Wed, Dec 20, 2023 at 12:07 AM Heng Qi  wrote:



在 2023/12/19 下午3:53, Jason Wang 写道:

On Mon, Dec 18, 2023 at 12:54 PM Heng Qi  wrote:


在 2023/12/18 上午11:10, Jason Wang 写道:

On Fri, Dec 15, 2023 at 5:51 PM Heng Qi  wrote:

Hi all!

I would like to ask if anyone has any comments on this version, if so
please let me know!
If not, I will collect Michael's comments and publish a new version next
Monday.

I have a dumb question. (And sorry if I asked it before)

Looking at the spec and code. It looks to me DATA_VALID could be set
without GUEST_CSUM.

I don't see that in the spec.
Am I missing something? [1][2]

[1] If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has
validated the packet checksum. In case of multiple encapsulated
protocols, one level of checksums has been validated.
Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN features
*enable receive checksum*, large receive offload and ECN support which
are the input equivalents of the transmit checksum, transmit
segmentation *offloading* and ECN features, as described in 5.1.6.2.

[2] If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device *MUST set
flags to zero* and SHOULD supply a fully checksummed packet to the driver.

So this is kind of ambiguous and seems not what I wanted when I wrote
the code for DATA_VALID in 2011.

Hi Jason, please see below.


NEEDS_CSUM maps to CHECKSUM_PARTIAL which means the packet checksum is
correct.

Yes. This mapping is because the PARTIAL checksum usually does not go
through the physical wire,
so it is considered safe, and the checksum does not need to be verified.


So spec had

"""
If neither VIRTIO_NET_HDR_F_NEEDS_CSUM nor VIRTIO_NET_HDR_F_DATA_VALID
is set, the driver MUST NOT rely on the packet checksum being correct.
"""

Yes. The checksum of a packet without NEEDS_CSUM or has not been
verified (DATA_VALID set) is unreliable.
This patch doesn't break that.


For DATA_VALID, it maps to CHECKSUM_UNNECESSARY which is mutually
exclusive with CHECKSUM_PARTAIL.

Yes. Both cannot be set or appear at the same time.

So setting both DATA_VALID and NEEDS_CSUM seems ambiguous.

NEEDS_CSUM: the data is correct but the packet doesn't contain checksum


This is not containing checksum, the pseudo header checksum is saved in 
the checksum field of the transport header.



DATA_VALID: the checksum has been validated, this implies the packet
contains a checksum


I'm not sure if both are set at the same time, and even if set, 
CHECKSUM_PARTIAL will still work when forwarded.

But why are we discussing this?




And this is what Linux did right now:

For tun_put_user():

  if (skb->ip_summed == CHECKSUM_PARTIAL) {
  ...
  } else if (has_data_valid &&
 skb->ip_summed == CHECKSUM_UNNECESSARY) {
 hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
  } /* else everything is zero */

This CHECKSUM_UNNECESSARY will work even if GUEST_CSUM is disabled if
I was not wrong.

I think you are talking about this commit:
10a8d94a95742bb15b4e617ee9884bb4381362be

But in fact, as your commit log says, I think this is a hack.

It's not, see below.


Host nics
does not fall into the scope of virtio spec?

Seems not, a lot of NIC produces CHECKSUM_UNNECESSARY, I don't see how
virtio-net differs in this case.




And in receive_buf():

  if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
  skb->ip_summed = CHECKSUM_UNNECESSARY;

I think we can fix this by safely removing "*MUST set flags to zero*"
in [2] from the spec.

Sorry. I cannot follow this view.

1. First of all, VIRTIO_NET_F_GUEST_CSUM (partial csum is not considered
now, because we have no dispute about it) does represent the device's
ability to calculate and verify checksums.
Its ability to handle partial checksums (NEEDS_CSUM) is just a special
processing of virtio, the Linux kernel never had a netdev feature for
partial checksum handling.

1.1 VIRTIO_NET_F_GUEST_{TSO4, TSO6, USO4} etc. depend on
VIRTIO_NET_F_GUEST_CSUM.
  The reason for being relied upon is not that they are related
to NEEDS_CSUM, but that the device needs to recalculate and verify the
checksum of the packets when merging the packets.
  See netdev_fix_features:
 if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
   dev->features |= NETIF_F_RXCSUM;
- netdev_fix_features ->
 if (!(features & NETIF_F_RXCSUM)) {
   /* NETIF_F_GRO_HW implies doing RXCSUM since every packet
* successfully merged by hardware must also have the
* checksum verified by hardware. If the user does not
* want to enable RXCSUM, logically, we should disable
GRO_HW.
  

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-19 Thread Heng Qi




在 2023/12/20 下午2:59, Jason Wang 写道:

On Wed, Dec 20, 2023 at 2:30 PM Heng Qi  wrote:



在 2023/12/20 下午1:48, Jason Wang 写道:

On Wed, Dec 20, 2023 at 12:07 AM Heng Qi  wrote:


在 2023/12/19 下午3:53, Jason Wang 写道:

On Mon, Dec 18, 2023 at 12:54 PM Heng Qi  wrote:

在 2023/12/18 上午11:10, Jason Wang 写道:

On Fri, Dec 15, 2023 at 5:51 PM Heng Qi  wrote:

Hi all!

I would like to ask if anyone has any comments on this version, if so
please let me know!
If not, I will collect Michael's comments and publish a new version next
Monday.

I have a dumb question. (And sorry if I asked it before)

Looking at the spec and code. It looks to me DATA_VALID could be set
without GUEST_CSUM.

I don't see that in the spec.
Am I missing something? [1][2]

[1] If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has
validated the packet checksum. In case of multiple encapsulated
protocols, one level of checksums has been validated.
Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN features
*enable receive checksum*, large receive offload and ECN support which
are the input equivalents of the transmit checksum, transmit
segmentation *offloading* and ECN features, as described in 5.1.6.2.

[2] If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device *MUST set
flags to zero* and SHOULD supply a fully checksummed packet to the driver.

So this is kind of ambiguous and seems not what I wanted when I wrote
the code for DATA_VALID in 2011.

Hi Jason, please see below.


NEEDS_CSUM maps to CHECKSUM_PARTIAL which means the packet checksum is
correct.

Yes. This mapping is because the PARTIAL checksum usually does not go
through the physical wire,
so it is considered safe, and the checksum does not need to be verified.


So spec had

"""
If neither VIRTIO_NET_HDR_F_NEEDS_CSUM nor VIRTIO_NET_HDR_F_DATA_VALID
is set, the driver MUST NOT rely on the packet checksum being correct.
"""

Yes. The checksum of a packet without NEEDS_CSUM or has not been
verified (DATA_VALID set) is unreliable.
This patch doesn't break that.


For DATA_VALID, it maps to CHECKSUM_UNNECESSARY which is mutually
exclusive with CHECKSUM_PARTAIL.

Yes. Both cannot be set or appear at the same time.

So setting both DATA_VALID and NEEDS_CSUM seems ambiguous.

NEEDS_CSUM: the data is correct but the packet doesn't contain checksum

This is not containing checksum, the pseudo header checksum is saved in
the checksum field of the transport header.

I have a hard time understanding this. But yes, basically I meant the
checksum is partial. So the device can't do validation.


If the rx device does receive a partially checksummed packet, but the 
driver requires a fullly
checksummed packet, then the rx device can help to calculate the full 
checksum for packets.





DATA_VALID: the checksum has been validated, this implies the packet
contains a checksum

I'm not sure if both are set at the same time, and even if set,
CHECKSUM_PARTIAL will still work when forwarded.
But why are we discussing this?

I don't get this question.

As a reviewer, I have the right to raise any issue I spot. This is how
the community works.


Sorry I wasn't questioning your question, and I think you captured the 
concerns very well from a nic perspective.




It is intended to reply to the past discussion

1) like your above statement "Both cannot be set or appear at the same time."
2) the example in Linux where CHECKSUM_UNNECESSARY and
CHECKSUM_PARTIAL are mutually exclusive.


And this is what Linux did right now:

For tun_put_user():

   if (skb->ip_summed == CHECKSUM_PARTIAL) {
   ...
   } else if (has_data_valid &&
  skb->ip_summed == CHECKSUM_UNNECESSARY) {
  hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
   } /* else everything is zero */

This CHECKSUM_UNNECESSARY will work even if GUEST_CSUM is disabled if
I was not wrong.

I think you are talking about this commit:
10a8d94a95742bb15b4e617ee9884bb4381362be

But in fact, as your commit log says, I think this is a hack.

It's not, see below.


Host nics
does not fall into the scope of virtio spec?

Seems not, a lot of NIC produces CHECKSUM_UNNECESSARY, I don't see how
virtio-net differs in this case.


And in receive_buf():

   if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
   skb->ip_summed = CHECKSUM_UNNECESSARY;

I think we can fix this by safely removing "*MUST set flags to zero*"
in [2] from the spec.

Sorry. I cannot follow this view.

1. First of all, VIRTIO_NET_F_GUEST_CSUM (partial csum is not considered
now, because we have no dispute about it) does represent the device's
ability to calculate and verify checksums.
Its ability to handle partial checksums (NEEDS_CSUM) is just a special
processing of virtio, the L

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-20 Thread Heng Qi




在 2023/12/20 下午3:35, Michael S. Tsirkin 写道:

On Wed, Dec 20, 2023 at 02:30:01PM +0800, Heng Qi wrote:

But why are we discussing this?

I think basically at this point everyone is confused about what
the feature does. right now we have packets
with
#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1   -> partial
#define VIRTIO_NET_HDR_F_DATA_VALID 2   -> unnecessary
and packets without either  -> none

if both 1 and 2 are set then linux uses VIRTIO_NET_HDR_F_NEEDS_CSUM but
I am not sure it's not a mistake. Maybe it does not matter.

What does this new thing do? So far all we have is "XDP will turn it on"
which is not really sufficient. I assumed it somehow replaces
partial with complete. That would make sense for many reasons,
for example the checksum fields in the header can be reused
for other purposes. But maybe not?



Hello Jaosn and Michael. I've summarized our discussion so far, so check 
it out below. Thank you very much!


From the nic perspective, I think Jason's statement is correct, the 
nic's checksum capability and setting DATA_VALID in flags
should not be determined by GUEST_CSUM feature. As long as the rx 
checksum offload is turned on, DATA_VALID
should be set. (Though we now bind GUEST_CSUM negotiation with rx 
checksum offload.)


Therefore, we need to pay attention to the information of rx checksum 
offload. Please check it out:


Devices that comply with the below description are said to be existing 
devices:
    "If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device *MUST* 
set flags to zero and SHOULD supply a fully checksummed packet to the 
driver."


As suggested by Jason, devices that comply with the below description 
are said to be new devices:
    "If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device *MAY* set 
flags to zero and SHOULD supply a fully checksummed packet to the driver."



1. Rx checksum offload is turned on
GUEST_CSUM feature is not negotiated. (now it is only used to indicate 
whether the driver can handle partially checksummed packets)

   a. Existing devices continue to set flags to 0;
   b. New devices may validate the packets and have flags set to 
DATA_VALID;

   c. Migration.
   Migration of existing devices continues to check GUEST_CSUM 
feature and rx checksum offload;

   Migration of new devices only check rx checksum offload;
   Without updating the existing migration management and control 
system, existing devices cannot be migrated to new devices, and new 
devices cannot be migrated to existing devices.
   d. How offload should be controlled now needs attention. Should 
CTRL_GUEST_OFFLOADS still issue GUEST_CSUM feature bit to control the rx 
checksum offload?


2. The new FULLY_CSUM feature must disable NEEDS_CSUM.
The device may set DATA_VALID regardless of whether FULLY_CSUM or 
GUEST_CSUM is negotiated.
   a. Rx fully checksum offload is still controlled by 
CTRL_GUEST_OFFLOADS carrying GUEST_FULLY_CSUM.
   b. When the rx device receives a partially checksummed packet, it 
should calculate the checksum and delivering a fully checksummed packet 
to the driver.



So now, if we modify the existing spec as Jason suggested, I think it's OK.
But we need to find out how to control rx checksum offload. WDYT?

Thanks!







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-20 Thread Heng Qi




在 2023/12/20 下午2:59, Jason Wang 写道:

On Wed, Dec 20, 2023 at 2:30 PM Heng Qi  wrote:



在 2023/12/20 下午1:48, Jason Wang 写道:

On Wed, Dec 20, 2023 at 12:07 AM Heng Qi  wrote:


在 2023/12/19 下午3:53, Jason Wang 写道:

On Mon, Dec 18, 2023 at 12:54 PM Heng Qi  wrote:

在 2023/12/18 上午11:10, Jason Wang 写道:

On Fri, Dec 15, 2023 at 5:51 PM Heng Qi  wrote:

Hi all!

I would like to ask if anyone has any comments on this version, if so
please let me know!
If not, I will collect Michael's comments and publish a new version next
Monday.

I have a dumb question. (And sorry if I asked it before)

Looking at the spec and code. It looks to me DATA_VALID could be set
without GUEST_CSUM.

I don't see that in the spec.
Am I missing something? [1][2]

[1] If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has
validated the packet checksum. In case of multiple encapsulated
protocols, one level of checksums has been validated.
Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN features
*enable receive checksum*, large receive offload and ECN support which
are the input equivalents of the transmit checksum, transmit
segmentation *offloading* and ECN features, as described in 5.1.6.2.

[2] If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device *MUST set
flags to zero* and SHOULD supply a fully checksummed packet to the driver.

So this is kind of ambiguous and seems not what I wanted when I wrote
the code for DATA_VALID in 2011.

Hi Jason, please see below.


NEEDS_CSUM maps to CHECKSUM_PARTIAL which means the packet checksum is
correct.

Yes. This mapping is because the PARTIAL checksum usually does not go
through the physical wire,
so it is considered safe, and the checksum does not need to be verified.


So spec had

"""
If neither VIRTIO_NET_HDR_F_NEEDS_CSUM nor VIRTIO_NET_HDR_F_DATA_VALID
is set, the driver MUST NOT rely on the packet checksum being correct.
"""

Yes. The checksum of a packet without NEEDS_CSUM or has not been
verified (DATA_VALID set) is unreliable.
This patch doesn't break that.


For DATA_VALID, it maps to CHECKSUM_UNNECESSARY which is mutually
exclusive with CHECKSUM_PARTAIL.

Yes. Both cannot be set or appear at the same time.

So setting both DATA_VALID and NEEDS_CSUM seems ambiguous.

NEEDS_CSUM: the data is correct but the packet doesn't contain checksum

This is not containing checksum, the pseudo header checksum is saved in
the checksum field of the transport header.

I have a hard time understanding this. But yes, basically I meant the
checksum is partial. So the device can't do validation.


DATA_VALID: the checksum has been validated, this implies the packet
contains a checksum

I'm not sure if both are set at the same time, and even if set,
CHECKSUM_PARTIAL will still work when forwarded.
But why are we discussing this?

I don't get this question.

As a reviewer, I have the right to raise any issue I spot. This is how
the community works.

It is intended to reply to the past discussion

1) like your above statement "Both cannot be set or appear at the same time."
2) the example in Linux where CHECKSUM_UNNECESSARY and
CHECKSUM_PARTIAL are mutually exclusive.


And this is what Linux did right now:

For tun_put_user():

   if (skb->ip_summed == CHECKSUM_PARTIAL) {
   ...
   } else if (has_data_valid &&
  skb->ip_summed == CHECKSUM_UNNECESSARY) {
  hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
   } /* else everything is zero */

This CHECKSUM_UNNECESSARY will work even if GUEST_CSUM is disabled if
I was not wrong.

I think you are talking about this commit:
10a8d94a95742bb15b4e617ee9884bb4381362be

But in fact, as your commit log says, I think this is a hack.

It's not, see below.


Host nics
does not fall into the scope of virtio spec?

Seems not, a lot of NIC produces CHECKSUM_UNNECESSARY, I don't see how
virtio-net differs in this case.


And in receive_buf():

   if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
   skb->ip_summed = CHECKSUM_UNNECESSARY;

I think we can fix this by safely removing "*MUST set flags to zero*"
in [2] from the spec.

Sorry. I cannot follow this view.

1. First of all, VIRTIO_NET_F_GUEST_CSUM (partial csum is not considered
now, because we have no dispute about it) does represent the device's
ability to calculate and verify checksums.
Its ability to handle partial checksums (NEEDS_CSUM) is just a special
processing of virtio, the Linux kernel never had a netdev feature for
partial checksum handling.

 1.1 VIRTIO_NET_F_GUEST_{TSO4, TSO6, USO4} etc. depend on
VIRTIO_NET_F_GUEST_CSUM.
   The reason for being relied upon is not that they are related
to NEEDS_CSUM, but that the device needs to recalculate and verify the
checksum of the packe

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-20 Thread Heng Qi




在 2023/12/21 上午9:34, Jason Wang 写道:

On Wed, Dec 20, 2023 at 3:42 PM Heng Qi  wrote:



在 2023/12/20 下午2:59, Jason Wang 写道:

On Wed, Dec 20, 2023 at 2:30 PM Heng Qi  wrote:


在 2023/12/20 下午1:48, Jason Wang 写道:

On Wed, Dec 20, 2023 at 12:07 AM Heng Qi  wrote:

在 2023/12/19 下午3:53, Jason Wang 写道:

On Mon, Dec 18, 2023 at 12:54 PM Heng Qi  wrote:

在 2023/12/18 上午11:10, Jason Wang 写道:

On Fri, Dec 15, 2023 at 5:51 PM Heng Qi  wrote:

Hi all!

I would like to ask if anyone has any comments on this version, if so
please let me know!
If not, I will collect Michael's comments and publish a new version next
Monday.

I have a dumb question. (And sorry if I asked it before)

Looking at the spec and code. It looks to me DATA_VALID could be set
without GUEST_CSUM.

I don't see that in the spec.
Am I missing something? [1][2]

[1] If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has
validated the packet checksum. In case of multiple encapsulated
protocols, one level of checksums has been validated.
Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN features
*enable receive checksum*, large receive offload and ECN support which
are the input equivalents of the transmit checksum, transmit
segmentation *offloading* and ECN features, as described in 5.1.6.2.

[2] If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device *MUST set
flags to zero* and SHOULD supply a fully checksummed packet to the driver.

So this is kind of ambiguous and seems not what I wanted when I wrote
the code for DATA_VALID in 2011.

Hi Jason, please see below.


NEEDS_CSUM maps to CHECKSUM_PARTIAL which means the packet checksum is
correct.

Yes. This mapping is because the PARTIAL checksum usually does not go
through the physical wire,
so it is considered safe, and the checksum does not need to be verified.


So spec had

"""
If neither VIRTIO_NET_HDR_F_NEEDS_CSUM nor VIRTIO_NET_HDR_F_DATA_VALID
is set, the driver MUST NOT rely on the packet checksum being correct.
"""

Yes. The checksum of a packet without NEEDS_CSUM or has not been
verified (DATA_VALID set) is unreliable.
This patch doesn't break that.


For DATA_VALID, it maps to CHECKSUM_UNNECESSARY which is mutually
exclusive with CHECKSUM_PARTAIL.

Yes. Both cannot be set or appear at the same time.

So setting both DATA_VALID and NEEDS_CSUM seems ambiguous.

NEEDS_CSUM: the data is correct but the packet doesn't contain checksum

This is not containing checksum, the pseudo header checksum is saved in
the checksum field of the transport header.

I have a hard time understanding this. But yes, basically I meant the
checksum is partial. So the device can't do validation.

If the rx device does receive a partially checksummed packet, but the
driver requires a fullly
checksummed packet, then the rx device can help to calculate the full
checksum for packets.

So this can only happen for virtual devices as hardware devices can't
receive partial csum packets.


YES. It should be.




DATA_VALID: the checksum has been validated, this implies the packet
contains a checksum

I'm not sure if both are set at the same time, and even if set,
CHECKSUM_PARTIAL will still work when forwarded.
But why are we discussing this?

I don't get this question.

As a reviewer, I have the right to raise any issue I spot. This is how
the community works.

Sorry I wasn't questioning your question, and I think you captured the
concerns very well from a nic perspective.

I see, thanks. I want to offer help indeed.


Thanks very much!




It is intended to reply to the past discussion

1) like your above statement "Both cannot be set or appear at the same time."
2) the example in Linux where CHECKSUM_UNNECESSARY and
CHECKSUM_PARTIAL are mutually exclusive.


And this is what Linux did right now:

For tun_put_user():

if (skb->ip_summed == CHECKSUM_PARTIAL) {
...
} else if (has_data_valid &&
   skb->ip_summed == CHECKSUM_UNNECESSARY) {
   hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
} /* else everything is zero */

This CHECKSUM_UNNECESSARY will work even if GUEST_CSUM is disabled if
I was not wrong.

I think you are talking about this commit:
10a8d94a95742bb15b4e617ee9884bb4381362be

But in fact, as your commit log says, I think this is a hack.

It's not, see below.


Host nics
does not fall into the scope of virtio spec?

Seems not, a lot of NIC produces CHECKSUM_UNNECESSARY, I don't see how
virtio-net differs in this case.


And in receive_buf():

if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
skb->ip_summed = CHECKSUM_UNNECESSARY;

I think we can fix this by safely removing "*MUST set flags to zero*"
in [2] from the spec.

Sorry. I cannot follow this view.

1. First o

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum

2023-12-20 Thread Heng Qi




在 2023/12/21 上午9:34, Jason Wang 写道:

On Wed, Dec 20, 2023 at 3:35 PM Michael S. Tsirkin  wrote:

On Wed, Dec 20, 2023 at 02:30:01PM +0800, Heng Qi wrote:

But why are we discussing this?

I think basically at this point everyone is confused about what
the feature does. right now we have packets
with
#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1   -> partial
#define VIRTIO_NET_HDR_F_DATA_VALID 2   -> unnecessary
and packets without either  -> none

if both 1 and 2 are set then linux uses VIRTIO_NET_HDR_F_NEEDS_CSUM but
I am not sure it's not a mistake. Maybe it does not matter.

What does this new thing do? So far all we have is "XDP will turn it on"
which is not really sufficient. I assumed it somehow replaces
partial with complete.

It looks not? CHECKSUM_COMPLETE is less optimal than
CHECKSUM_UNNCESSARY as validation is still needed.

If I understand correctly, this new thing wants DATA_VALID only.


Disable NEEDS_CSUM or calculate fully checksummed packets to fully 
checksummed packets (how this is done does not matter).
The driver will only receive two types of packets: CHECKSUM_NONE and 
DATA_VALID (CHECKSUM_UNNECESSARY).


Thanks!



Thanks




That would make sense for many reasons,
for example the checksum fields in the header can be reused
for other purposes. But maybe not?


--
MST




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] RE: [virtio-dev] RE: [virtio-comment] [PATCH v2] virtio-net: support setting coalescing params for multiple vqs

2024-01-23 Thread Heng Qi




在 2024/1/23 下午3:15, Michael S. Tsirkin 写道:

On Tue, Jan 23, 2024 at 05:55:02AM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Monday, January 22, 2024 1:06 PM

On Mon, Jan 22, 2024 at 05:03:38AM +, Parav Pandit wrote:

The right test on Linux to do without rtnl lock which is anyway
ugly and

wrong semantic to use blocking the whole netdev stack.

(in case if you used that).

Do you have any good directions and attempts to remove rtnl_lock?


I think per device lock instead of rtnl is first step that we can start with.

Wil check internally who if someone already started working on it.

I feel the issue is at the conceptual level.

Not for requests which are initiated by the kernel stack (non user initiated).

So how is this different? Is it basically just because
tweaking coalescing in unexpected ways is considered mostly
harmless?


DIM sends configurations frequently, which is try best.




Yes some drivers will take a command
and just queue it for execution later, but this means that errors can not be
propagated back at all. Imagine device with mac
0x123 in promisc mode. Now commands:

1- program MAC 0xabcdef
2- disable promisc mode


User initiated commands error can be propagated when the command completes.
Enqueuing command is at the different bottom level in the driver.


If command 1 fails but 2 proceeds then packets with MAC 0xabc will be
dropped.

Any attempts to batch arbitrary commands will have this issue - be it at driver
or device level.


There is no suggestion to batch arbitrary commands from the driver side.
The suggestion is to batch VQs notification coalescing from the driver side.


So, here's my question: what exactly is the guest behaviour that is driving this
work? Is it with a linux guest?

At least looks to me yes based on the partial patches which are taking rtnl 
lock on netdim's worker callbacks.


which commands does userspace issue that we
need to send multiple vq coalescing commands?

None.


  If all you want is to send
same config to all VQs then why not just use
VIRTIO_NET_CTRL_NOTF_COAL_RX_SET as opposed to
VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET ?

Only kernel stack initiated VQ notification coalescing changes.
Since every VQ has different values, VIRTIO_NET_CTRL_NOTF_COAL_RX_SET is not 
sufficient.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] RE: [virtio-comment] RE: [virtio-dev] RE: [virtio-comment] [PATCH v2] virtio-net: support setting coalescing params for multiple vqs

2024-01-24 Thread Heng Qi




在 2024/1/22 下午1:03, Parav Pandit 写道:



From: Heng Qi 
Sent: Monday, January 22, 2024 8:27 AM

在 2024/1/20 下午5:59, Parav Pandit 写道:

From: Heng Qi 
Sent: Wednesday, January 17, 2024 10:22 AM

在 2024/1/15 下午9:21, Parav Pandit 写道:

From: virtio-comm...@lists.oasis-open.org
 On Behalf Of Heng Qi
Sent: Monday, January 15, 2024 6:36 PM

Currently, when each time the driver attempts to update the
coalescing parameters for a vq, it needs to kick the device and
wait for the ctrlq response to return.

It does not need to wait. This is some driver limitation that does
not use

the queue as "queue".

Such driver limitation should be removed in the driver. It does not
qualify

as limitation.

Yes, we don't have to wait.

But in general, for user commands, it is necessary to obtain the
final results synchronously.

Yes. Use initiated command can enqueue the request to cvq. Go to sleep

for several micro to milliseconds.

The user command cannot return before the final result is obtained.
And wait is not the problem this patch solves.


By not holding the rtnl lock, rest of the context that needs to enqueue the

request can progress such as that of netdim.

Would like to see the using of rtnl lock changed.


Inside the virtnet_rx_dim_work() there should be rtnl lock call.
A virtio_device level lock to be used for cvq. :)


In addition, I have made batching and asynchronousization of the netdim
command, you can refer to this patch:
https://lore.kernel.org/all/1705410693-118895-4-git-send-email-
hen...@linux.alibaba.com/


In the listed above driver patch the motivation "to optimize the
CPU overhead of the DIM worker caused by the guest being busy
waiting for the command response result."

Is not right.
Because guest is still busy waiting.


There is no busy wait for guests, see get_cvq_work().


Without batching, due to rtnl lock every VQ command is serialized as one 
outstanding command at a time in virtnet_rx_dim_work().
Due to this device is unable to take benefit of DMA batching at large scale.


Adding dim commands is now asynchronous, and the device will receive 
batches of commands.


  

This will enable driver to enqueue multiple cvq commands without
waiting

for previous one.

This will also enable device to find natural coalescing done on
multiple

commands.

When batch user commands occur, ensuring synchronization is a concern.


The following path is observed: 1. Driver kicks the device; 2.
After the device receives the kick, CPU scheduling occurs and DMA
multiple buffers multiple times; 3. The device completes processing
and replies

with a response.

When large-queue devices issue multiple requests and kick the
device frequently, this often interrupt the work of the device-side CPU.

When there is large devices and multiple driver notifications by a
cpu that is N times faster than the device side cpu, the device may
find natural

coalescing on the commands of a given cvq.

First we have to solve the ctrlq batch adding user (ethtool) command.
Even if processed in a batch way on device side, the number of kicks
and the number of backend DMAs has not been reduced.

Driver notifications are PCI writes so it should not hamper device side,

which can ignore them when they do not bother about it.

Driver notifications need to be processed by the DPU, which interferes with
the CPU on the DPU.


I was asking, if there is anyway to disable for your DPU to ignore these 
notifications while previous one is pending?
 From your above description, it seems there isn’t.


While the device is processing the request, additional kicks are ignored.

Thanks.




Backend DMAs should be reduced by avoiding the LIFO pattern followed by

the splitq driver.

Placing the descriptors contiguously like packedq reduces amount of DMA

naturally.

splitq is widely used, migrating to packedq is not that easy, especially when
there are many components and hardware involved.

I am not suggesting to packed_VQ.
I am suggesting fixing the driver to not do LIFO on descriptors for splitq.
In other words, using contiguous set of descriptors on splitq will improve the 
splitq for DMA.
This will allow using splitq more efficiently for dma as short-term solution 
for DMA until more efficient queues are defined.


The second predicable DMA to avoid is having 8Bytes of data inline in the

descriptor, instead of 16B indirection and extra dma.

Looking forward to working inline!
But I think this does not conflict with batch work, and combining the two will
be more beneficial.


It does not conflict. However, batching for large number of queues may not use 
the inline as the data bytes may not fit in the inline.
  

For multiple DMAs, we need to way to send 8 bytes of data without 16

bytes of indirection via a descriptor.

This is what we discussed a while back to do in txq and Stefan
suggested to

generalize for more queues, which is also a good idea.

Yes, this sounds good.


This the next item to focus as soon as flow 

[virtio-dev] [PATCH] virtio_net: support split header

2022-07-31 Thread Heng Qi
From: Xuan Zhuo 

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split on a 
packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is skipped in 
the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case of
successful negotiation.

 conformance.tex |   2 +
 content.tex | 111 ++--
 2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
 channel.
 
+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
+header and the payload.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
 #define VIRTIO_NET_HDR_F_DATA_VALID2
 #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
 u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE0
 #define VIRTIO_NET_HDR_GSO_TCPV4   1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
 not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.
 
 If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
-been negotiated, the device SHOULD set \field{hdr_len} to a value

[virtio-dev] Re: [PATCH] virtio_net: support split header

2022-08-01 Thread Heng Qi
The content of the mail just sent is about "[PATCH v6] virtio_net: 
support split header".


在 2022/8/1 下午2:59, Heng Qi 写道:

From: Xuan Zhuo 

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split on 
a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is skipped 
in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case of
 successful negotiation.

  conformance.tex |   2 +
  content.tex | 111 ++--
  2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol

+header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
  u8 flags;
  #define VIRTIO_NET_HDR_GSO_NONE0
  #define VIRTIO_NET_HDR_GSO_TCPV4   1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  not set VIRTIO_NET_HDR_F

Re: [virtio-dev] Re: [PATCH] virtio_net: support split header

2022-08-04 Thread Heng Qi


在 2022/8/4 下午2:27, Jason Wang 写道:

On Mon, Aug 1, 2022 at 2:59 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split on 
a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is skipped 
in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case of
 successful negotiation.

  conformance.tex |   2 +
  content.tex | 111 ++--
  2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
+header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
  u8 flags;
  #define VIRTIO_NET_HDR_GSO_NONE0
  #define VIRTIO_NET_HDR_GSO_TCPV4   1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.

  If one of the VIRTIO_NET_F_GUES

Re: [virtio-dev] Re: [PATCH] virtio_net: support split header

2022-08-04 Thread Heng Qi


在 2022/8/4 下午9:50, Cornelia Huck 写道:

On Thu, Aug 04 2022, Heng Qi  wrote:


在 2022/8/4 下午2:27, Jason Wang 写道:

On Mon, Aug 1, 2022 at 2:59 PM Heng Qi   wrote:

@@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
   driver MUST NOT use the \field{csum_start} and \field{csum_offset}.

   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
-been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
+been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
+is not set, the driver MAY use \field{hdr_len} only as a hint about the
   transport header size.
-The driver MUST NOT rely on \field{hdr_len} to be correct.
+
+If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the 
driver
+MUST NOT rely on \field{hdr_len} to be correct.

I think we should keep the above description as-is. For whatever case,
the driver must not trust the metadata set by the device and must
perform necessary sanity tests on them.


My idea is to keep the current description as it is,
but to emphasize in the next version:
"If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
the driver MAY treat the \field{hdr_len} as the length of the
protocol header inside the first descriptor."


Just to be clear, you suggest using

"If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
transport header size.

The driver MUST NOT rely on \field{hdr_len} to be correct.

If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
the driver MAY treat the \field{hdr_len} as the length of the
protocol header inside the first descriptor."


Yes. I will use the above description to make it clearer in the next version.




(Maybe "...the driver MAY use \field{hdr_len} as a hint about the length
of the protocol header..."? It's still not reliable, right?)


\field{hdr_len} is unreliable when VIRTIO_NET_F_SPLIT_HEADER is not negotiated.


If VIRTIO_NET_F_SPLIT_HEADER is negotiated, "split header" MAY perform the split
from the IP layer, so the protocol header and the transport header are 
different.

so I think the "...the driver MAY use \field{hdr_len} only as a hint about the
transport header size..." paragraph can be left as-is.




Re: [virtio-dev] Re: [PATCH] virtio_net: support split header

2022-08-10 Thread Heng Qi


在 2022/8/9 下午5:18, Jason Wang 写道:

On Thu, Aug 4, 2022 at 8:48 PM Heng Qi  wrote:


在 2022/8/4 下午2:27, Jason Wang 写道:

On Mon, Aug 1, 2022 at 2:59 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split on 
a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is skipped 
in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case of
 successful negotiation.

  conformance.tex |   2 +
  content.tex | 111 ++--
  2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
+header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
  u8 flags;
  #define VIRTIO_NET_HDR_GSO_NONE0
  #define VIRTIO_NET_HDR_GSO_TCPV4   1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  n

[virtio-dev] [PATCH v7] virtio_net: support split header

2022-08-16 Thread Heng Qi
From: Xuan Zhuo 

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Heng Qi 
Reviewed-by: Kangjie Xu 
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

 conformance.tex |   2 ++
 content.tex | 102 
 2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
 channel.
 
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
 #define VIRTIO_NET_HDR_F_DATA_VALID2
 #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
 u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE0
 #define VIRTIO_N

Re: [virtio-dev] [PATCH v7] virtio_net: support split header

2022-08-30 Thread Heng Qi

在 2022/8/25 下午10:22, Cornelia Huck 写道:

On Tue, Aug 16 2022, Heng Qi  wrote:


From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

I do not have any further comments on the change, let's see what the
networking folks think.


Okay. Thanks for your review.



[Do we require patches to be posted to virtio-comment, or is virtio-dev
enough? I'm a bit unsure right now.]


We can then consider posting patches to virtio-comment.


Re: [virtio-dev] [PATCH v7] virtio_net: support split header

2022-08-30 Thread Heng Qi


在 2022/8/16 下午5:34, Heng Qi 写道:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting

+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
  u8 

Re: [virtio-dev] [PATCH v7] virtio_net: support split header

2022-09-01 Thread Heng Qi

在 2022/8/16 下午5:34, Heng Qi 写道:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting

+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
  u8 

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-02 Thread Heng Qi

在 2022/9/2 下午2:21, Jason Wang 写道:

On Tue, Aug 16, 2022 at 5:35 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
 1. Fix some presentation issues.
 2. Use "split transport header". @Jason Wang
 3. Clarify some paragraphs. @Cornelia Huck
 4. determine the device what to do if it does not perform header split 
on a packet.

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-02 Thread Heng Qi


在 2022/9/2 下午2:21, Jason Wang 写道:

On Tue, Aug 16, 2022 at 5:35 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
 1. Fix some presentation issues.
 2. Use "split transport header". @Jason Wang
 3. Clarify some paragraphs. @Cornelia Huck
 4. determine the device what to do if it does not perform header split 
on a packet.

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-02 Thread Heng Qi


在 2022/9/2 下午2:41, Michael S. Tsirkin 写道:

On Fri, Sep 02, 2022 at 02:21:04PM +0800, Jason Wang wrote:

On Tue, Aug 16, 2022 at 5:35 PM Heng Qi  wrote:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
 1. Fix some presentation issues.
 2. Use "split transport header". @Jason Wang
 3. Clarify some paragraphs. @Cornelia Huck
 4. determine the device what to do if it does not perform header split 
on a packet.

v6:
 1. Fix some syntax issues. @Cornelia Huck
 2. Clarify some paragraphs. @Cornelia Huck
 3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
 1. Determine when hdr_len is credible in the process of rx
 2. Clean up the use of buffers and descriptors
 3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
 1. fix typo @Cornelia Huck @Jason Wang
 2. do not split header for IP fragmentation packet. @Jason Wang

v3:
 1. Fix some syntax issues
 2. Fix some terminology issues
 3. It is not unified with ip alignment, so ip alignment is not included
 4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / 
Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}

  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / 
Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.

+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.

  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}

  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-05 Thread Heng Qi


在 2022/9/5 下午3:52, Xuan Zhuo 写道:

On Sun, 4 Sep 2022 16:31:59 -0400, "Michael S. Tsirkin"  wrote:

On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote:

When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
the driver requires that the buffers submitted to receiveq
MUST be composed of at least two descriptors,
which means that each buffer the device gets is a descriptor chain,
even if the device does not split the header for some packets.

To store packet in the descriptor chain without header splitting
by the device, the device MUST start with the first descriptor of
the descriptor chain to store the packet, and MUST NOT set the
VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.

Thanks.

Descriptor chains will hurt performance badly.

I understand the reasons for the performance impact here are:
1. Two buffers are used
2. One buffer occupies two descs

This is the same as my understanding in the case of mergeable. We also need to
pack the packets into two buffers, and a packet will eventually occupy two
descs.



How about simply making this feature depend on mergeable buffers?
Then we have a separate buffer for the header and
this works cleanly.


Under mergeable, each buffer is independent, and the split header requires two
unequal descs.

If we implement it based on mergeable, then consider the scenario of tcp
zerocopy, when we fill receive vq, each buffer is an separate page, and if we 
use an
separate buffer to save the header, then this is a waste, we may
have to copy at the driver layer.

@Qi Do you think there will be other problems with this approach?

Thanks.


When we think about specs, we shouldn't be too distracted by the implementation.

But when we did think about this, suppose the driver fills by page based on
mergeable mode. in order to use the xdp program, the driver usually takes
the beginning of a single page as the headroom, and fills the rest of the page
into the virtqueue. Therefore, the empty buffer obtained by the
device is always smaller than a page when we implement split header
based on this mode, that is, the data load finally obtained by the driver
is offset from the beginning of the page. This does not enjoy the benefits of 
zero copy.

At the same time, since the header is always only more than 100 bytes,
the page occupied by the header is a waste of the buffer.

Thanks.








Re: [virtio-dev] [PATCH v7] virtio_net: support split header

2022-09-07 Thread Heng Qi


在 2022/8/16 下午5:34, Heng Qi 写道:

From: Xuan Zhuo

The purpose of this feature is to split the header and the payload of
the packet.

|receive buffer|
|   0th descriptor | 1th descriptor|
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo
Signed-off-by: Heng Qi
Reviewed-by: Kangjie Xu
---
v7:
1. Fix some presentation issues.
2. Use "split transport header". @Jason Wang
3. Clarify some paragraphs. @Cornelia Huck
4. determine the device what to do if it does not perform header split 
on a packet.

v6:
1. Fix some syntax issues. @Cornelia Huck
2. Clarify some paragraphs. @Cornelia Huck
3. Determine the device what to do if it does not perform header split 
on a packet.

v5:
1. Determine when hdr_len is credible in the process of rx
2. Clean up the use of buffers and descriptors
3. Clarify the meaning of used lenght if the first descriptor is 
skipped in the case of merge

v4:
1. fix typo @Cornelia Huck @Jason Wang
2. do not split header for IP fragmentation packet. @Jason Wang

v3:
1. Fix some syntax issues
2. Fix some terminology issues
3. It is not unified with ip alignment, so ip alignment is not included
4. Make it clear that the device must support four types, in the case 
of successful negotiation.

  conformance.tex |   2 ++
  content.tex | 102 
  2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Offloads State Configuration / Setting Offloads State}
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) }
  \item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}

@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / 
Conformance Targets}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Automatic receive steering in multiqueue mode}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
  \item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / 
Control Virtqueue / Split Transport Header}
  \end{itemize}
  
  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}

diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / 
Network Device / Feature bits
  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
  channel.
  
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting

+the transport header and the payload.
+
  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
  
  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.

@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device 
Types / Network Device
  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
VIRTIO_NET_F_HOST_TSO6.
  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
  \end{description}
  
  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}

@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / 
Network Device / Device O
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM1
  #define VIRTIO_NET_HDR_F_DATA_VALID2
  #define VIRTIO_NET_HDR_F_RSC_INFO  4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
  u8 

[virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-09 Thread Heng Qi




在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:

On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:

We need to clarify that the purpose of header splitting is to make all payloads
can be independently in a page, which is beneficial for the zerocopy
implemented by the upper layer.

absolutely, pls add motivation.


If the driver does not enforce that the buffers submitted to the receiveq MUST
be composed of at least two descriptors, then header splitting will become 
meaningless,
or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at 
this time.


Thanks.




This seems very narrow and unecessarily wasteful of descriptors.
What is wrong in this:

.. 

seems to achieve the goal of data in a separate page without
using extra descriptors.

thus my proposal to replace the requirement of a separate
descriptor with an offset of data from beginning of
buffer that driver sets.



We have carefully considered your suggestion.

We refer to spec v7 and earlier as scheme A for short. Review scheme A 
below:


| receive buffer |

| 0th descriptor | 1th descriptor |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

We use a buffer plus a separate page when allocating the receive

buffer. In this way, we can ensure that all payloads can be

independently in a page, which is very beneficial for the zerocopy

implemented by the upper layer.

scheme A better solves the problem of headroom, tailroom and memory 
waste, but as you said, this solution relies on descriptor chain.


Our rethinking approach is no longer based on or using descriptor chain.

We refer to your proposed offset-based scheme as scheme B:

As you suggested, scheme B gives the device a buffer, using offset to 
indicate where to place the payload like this:


.. 

But how to apply for this buffer? Since we want the payload to be placed 
on a separate page, the method we consider is to directly apply to the 
driver for two pages of contiguous memory.


Then the beginning of this contiguous memory is used to store the 
headroom, and the contiguous memory after the headroom is directly 
handed over to the device. similar to the following:


<-- receive buffer(2 pages) 
->


<<-- first page 
--->< second page -->>


<header>>


Based on your previous suggestion, we also considered another new scheme C.

This scheme is implemented based on mergeable buffer, filling a separate 
page each time.


If the split header is negotiated and the packet can be successfully 
split by the device, the device needs to find at least two buffers, 
namely two pages, one for the virtio-net header and transport header, 
and the other for the data payload. Like the following:


| receive buffer1(page) | receive buffer2 (page) |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

At the same time, if XDP is considered, then the device needs to add 
headroom at the beginning of receive buffer1 when receiving packets, so 
that the driver can process programs similar to XDP. In order to solve 
this problem, can scheme C introduce an offset, which requires the 
device to write data from the offset position to receive buffer1, like 
the following:


| receive buffer (page) | receive buffer (page) |

| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold 
-->| payload |


Then we simply compare the advantages and disadvantages of scheme A(spec 
v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable 
buffer):


1. desc chain:

- A depends on desciptor chain; - B, C do not depend on desciptor chain.

2. page alloc

- B fills two consecutive pages, which causes a great waste of memory 
for small packages such as arp; - C fills a single page, slightly better 
than B.


3. Memory waste:

- The memory waste of scheme A is mainly the 0th descriptor that is 
skipped by the device; - When scheme B and scheme C successfully split 
the header, there is a huge waste of the first page, but the first page 
can be quickly released by copying.


4. headroom

- The headrooms of plan A and plan B are reserved; - Scheme C requires 
the driver to set off to let the device skip off when using receive buffer1.


5. tailroom

- When splitting the header, skb usually needs to store each independent 
page in the non-linear data area based on shinfo. - The tailroom of 
scheme A is reserved by itself; - Scheme B requires the driver to set 
the reserved padding area for the first receive buffer(2 pages) to use 
shinfo when the split header is not successfully executed; - Scheme C 
requires the driver to set max_len for the first receive buffer(page).



Which plan do you prefer?

---

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubsc

Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-09 Thread Heng Qi
On Sun, Sep 04, 2022 at 04:27:38PM -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
> > We need to clarify that the purpose of header splitting is to make all 
> > payloads
> > can be independently in a page, which is beneficial for the zerocopy
> > implemented by the upper layer.
> 
> absolutely, pls add motivation.
> 
> > If the driver does not enforce that the buffers submitted to the receiveq 
> > MUST
> > be composed of at least two descriptors, then header splitting will become 
> > meaningless,
> > or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated 
> > at this time.
> > 
> > 
> > Thanks.
> > 
> > 
> 
> 
> This seems very narrow and unecessarily wasteful of descriptors.
> What is wrong in this:
> 
> .. 
> 
> seems to achieve the goal of data in a separate page without
> using extra descriptors.
> 
> thus my proposal to replace the requirement of a separate
> descriptor with an offset of data from beginning of
> buffer that driver sets.
>


We have carefully considered your suggestion. 

Let's summarize the schemes we've considered before and now.

1. Scheme A ( refer to spec v7 )

We refer to spec v7 and earlier as scheme A for short. Review scheme A below: 
| receive buffer| 
|  0th descriptor  | 1th descriptor | 
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|  payload   | 

We use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be put
independently in a page, which is very beneficial for the zerocopy 
implemented by the upper layer. 

Scheme A better solves the problem of headroom, tailroom and
memory waste, but as you said, this solution relies on descriptor chain. 

2. Scheme B ( refer to your suggestion )

Our rethinking approach is no longer based on descriptor chain.
 
We refer to your proposed offset-based scheme as scheme B.
As you suggested, scheme B gives the device a buffer, using offset to
indicate where to place the payload. Like this: 

..  

But how to apply for this buffer?
Since we want the payload to be placed on a separate page, the method
we consider is to directly alloc two pages from driver of contiguous memory. 

Then the beginning of this contiguous memory is used to store the headroom,
and the contiguous memory after the headroom is directly handed over to the 
device.
Similar to the following: 

[-- receive buffer(2 pages) --] 
[<first page ---><-- second page >] 
[<-> ..<   payload >] 
   ^^
   ||
   |pointer to device
   |
   |
   Driver reserved, the later part is filled

3. Scheme C (this sheme we have sent to you on September 7th, maybe you miss 
it.)

Based on your previous suggestion, we also considered another new scheme C. 
This scheme is implemented based on mergeable buffer, filling a separate page 
each time. 

If the split header is negotiated and the packet can be successfully split by 
the device,
the device needs to find at least two buffers, namely two pages, one for the 
virtio-net header
and transport header, and the other for the payload. Like the following: 

|   receive buffer1(page)  | receive buffer2 (page) 
  | 
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload
  | 

At the same time, if XDP is considered, then the device needs to add headroom 
at the
beginning of receive buffer1 when receiving packets, so that the driver can 
process
programs similar to XDP. 

In order to solve this problem, scheme C introduce an offset which requires
the device to write data from the offset to receive buffer1, like the 
following: 

|   receive buffer (page) | 
receive buffer (page) | 
| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| 
payload   | 
^
|
pointer to device
   


4. Summarize

Then we simply compare the advantages and disadvantages of scheme A(spec v7),
scheme B (offset buffer(2 pages)) and scheme C (based on mergeable buffer): 

1). descriptor chain: 
 i)  A depends on desciptor chain;
 ii) B, C do not depend on desciptor chain. 

2). page alloc 
 i) B fills with two consecutive pages, which causes a great waste of 
memory
for small packages such as arp;
 ii) C fills with a single page, slightly better than B. 

3). Memory waste: 
 i) The memory waste of scheme A is mainly the 0th descriptor
that is skipped by the device;
 ii) When sch

Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-13 Thread Heng Qi
On Fri, Sep 09, 2022 at 08:47:57PM +0800, Xuan Zhuo wrote:
> 
> hi
>Qi also sent another same email today. Due to some email client problems,
>this email has some confusion in the format, so we can discuss
>under another one.
> 
>https://lists.oasis-open.org/archives/virtio-dev/202209/msg00066.html
> 

Yes. Due to some formatting issues with the mail client, I resent this new email
which may be in a clear style for your review.

Do you have more questions about the contents of this new email?

Looking forward to your comments.

Thanks.

> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header

2022-09-27 Thread Heng Qi




在 2022/9/28 上午5:35, Michael S. Tsirkin 写道:

On Wed, Sep 14, 2022 at 11:34:43AM +0800, Jason Wang wrote:

在 2022/9/9 20:38, Xuan Zhuo 写道:

On Fri, 9 Sep 2022 07:15:02 -0400, "Michael S. Tsirkin"  wrote:

On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote:

在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:

On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:

We need to clarify that the purpose of header splitting is to make all payloads
can be independently in a page, which is beneficial for the zerocopy
implemented by the upper layer.

absolutely, pls add motivation.


If the driver does not enforce that the buffers submitted to the receiveq MUST
be composed of at least two descriptors, then header splitting will become 
meaningless,
or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at 
this time.


Thanks.



This seems very narrow and unecessarily wasteful of descriptors.
What is wrong in this:

.. 

seems to achieve the goal of data in a separate page without
using extra descriptors.

thus my proposal to replace the requirement of a separate
descriptor with an offset of data from beginning of
buffer that driver sets.



We have carefully considered your suggestion.

We refer to spec v7 and earlier as scheme A for short. Review scheme A
below:

| receive buffer |

| 0th descriptor | 1th descriptor |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

We use a buffer plus a separate page when allocating the receive

buffer. In this way, we can ensure that all payloads can be

independently in a page, which is very beneficial for the zerocopy

implemented by the upper layer.

scheme A better solves the problem of headroom, tailroom and memory waste,
but as you said, this solution relies on descriptor chain.

Our rethinking approach is no longer based on or using descriptor chain.

We refer to your proposed offset-based scheme as scheme B:

As you suggested, scheme B gives the device a buffer, using offset to
indicate where to place the payload like this:

.. 

But how to apply for this buffer? Since we want the payload to be placed on
a separate page, the method we consider is to directly apply to the driver
for two pages of contiguous memory.

Then the beginning of this contiguous memory is used to store the headroom,
and the contiguous memory after the headroom is directly handed over to the
device. similar to the following:

<-- receive buffer(2 pages)
->

<<-- first page
--->< second page -->>

<>

Based on your previous suggestion, we also considered another new scheme C.

This scheme is implemented based on mergeable buffer, filling a separate
page each time.

If the split header is negotiated and the packet can be successfully split
by the device, the device needs to find at least two buffers, namely two
pages, one for the virtio-net header and transport header, and the other for
the data payload. Like the following:

| receive buffer1(page) | receive buffer2 (page) |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

At the same time, if XDP is considered, then the device needs to add
headroom at the beginning of receive buffer1 when receiving packets, so that
the driver can process programs similar to XDP. In order to solve this
problem, can scheme C introduce an offset, which requires the device to
write data from the offset position to receive buffer1, like the following:

| receive buffer (page) | receive buffer (page) |

| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|
payload |

And in fact, B and C both use an offset now, right?

B: offset is used to get the position to place the payload.
C: The offset is used to reserve some space for the device, which the driver can
 use as headroom.

 In order to make the payload page-aligned, we can only hand over the entire
 page to the device, so we cannot reserve some headroom in advance.


For C, it might be better to do some tweak since mergeable buffer doesn't
forbid using a descriptor chain as a single buffer.

So if it's a descriptor chain we got back the method A by placing the
payload in a dedicated buffer. If it's not placing the payload in an
adjacent buffer.

Thanks

Let's find a way so devices do not care how descriptors are laid out.


Hi, I'm not sure if you're replying to the wrong email (now in v7 thread),
if not, I'm guessing you mean we continue to discuss a way for devices
to not care how descriptors are laid out based on split header v7,
and no longer consider the new options B and C?


Thanks.




Then we simply compare the advantages and disadvantages of scheme A(spec
v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable
buffer):

1. desc chain:

- A

Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header

2022-10-20 Thread Heng Qi
On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they 
> > > > > > conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device 
> > > > > can just
> > > > > to not split the header when the layout doesn't fit for header 
> > > > > splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > > indicates to both the device and the driver that no
> > > > assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> 
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
> 
> 1) the driver know what kind of packet it can receive
> 2) protocol have fixed length of the header
> 
> This is probably not true consider:
> 
> 1) TCP and UDP have different header length
> 2) IPv6 has an variable length of the header
> 
> 
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers. And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
> 
> How do we know the bigger buffer is sufficient for the packet? If we
> try to allocate 64K (not sufficient for the future even) it breaks the
> effort of the mergeable buffer:
> 
> header buffer #1
> payload buffer #1
> header buffer #2
> payload buffer #2
> 
> Is the device expected to
> 
> 1) fill payload in header buffer #2, this breaks the effort that we
> want to make payload page aligned
> 2) skip header buffer #2, in this case, the device assumes the framing
> when it breaks any layout
> 
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out 
> > whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> > If a receive packet is spread over multiple buffers, the device
> > MUST use all buffers but the last (i.e. the first 
> > \field{num_buffers} -
> > 1 buffers) completely up to the full length of each buffer
> > supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data.
> 
> I had two questions
> 
> 1) why is this not a problem of mergeable? There's no guarantee that
> the header is just the length of what the driver allocates for header
> buffer anyhow
> 
> E.g the header length could be smaller than the header buffer, the
> device still needs to skip part of the space in the header buffer.
> 
> 2) it should be the responsibility of the driver to handle the
> uninitialized space, it should do anything that is necessary for
> security, more below
> 


We've talked a bit more about split header so far, but there still seem to
be some issues, so let's recap.

一、 Method Discussion Review

In order to adapt to the Eric's tcp receive interface to achieve zero copy,
header and payload are required to be stored separately, and the payload is
stored in a paged alignment way. Therefore, we have discussed several options
for split header as follows:

1: method A ( depend on the descriptor chain )
| receive buffer| 
|  0th descriptor

[virtio-dev] Re: [PATCH 1/2] virtio_net: fix syntax errors

2022-11-09 Thread Heng Qi




在 2022/11/9 下午9:23, Michael S. Tsirkin 写道:

On Wed, Nov 09, 2022 at 07:35:14PM +0800, Heng Qi wrote:

Please ignore this email.

Thanks.

ok. if 2/2 is still relevant pls post it separately.


I sent a new patch set with a cover-letter, please see in 
https://lists.oasis-open.org/archives/virtio-dev/202211/msg00025.html.





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v3] virtio_net: support inner header hash

2022-12-18 Thread Heng Qi




在 2022/12/16 下午8:49, Michael S. Tsirkin 写道:

On Mon, Dec 05, 2022 at 02:36:39PM +0800, Heng Qi wrote:

@@ -4005,6 +4159,24 @@ \subsubsection{Processing of Incoming 
Packets}\label{sec:Device Types / Network
  #define VIRTIO_NET_HASH_REPORT_UDPv6_EX9
  \end{lstlisting}
  
+If \field{hash_report} differs from VIRTIO_NET_HASH_REPORT_NONE,

+\field{hash_report_tunnel} can report the type of the tunnel-encapsulated
+packet to the driver over the inner header hash calculation.
+Possible values that the device can report in \field{hash_report_tunnel}
+are defined below:
+
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_REPORT_GRE 1
+#define VIRTIO_NET_HASH_REPORT_VXLAN   2
+#define VIRTIO_NET_HASH_REPORT_GENEVE  3
+\end{lstlisting}
+
+The values VIRTIO_NET_HASH_REPORT_GRE, VIRTIO_NET_HASH_REPORT_VXLAN and
+VIRTIO_NET_HASH_REPORT_GENEVE correspond to VIRTIO_NET_HASH_TYPE_GRE_INNER,
+VIRTIO_NET_HASH_TYPE_VXLAN_INNER and VIRTIO_NET_HASH_TYPE_GENEVE_INNER bits
+of supported hash types defined in respectively
+\ref{sec:Device Types / Network Device / Device Operation / Processing of 
Incoming Packets / Hash calculation for incoming packets / Supported/enabled 
hash types}.
+
  \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / 
Device Operation / Control Virtqueue}
  
  The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is

If the new feature flag is negotiated, we need to spell out more clearly what 
are the rules
for packets that are not encapsulated. Is hash_report_tunnel set to 0


Yes, we should. When the _TUNNEL feature is negotiated, for 
non-encapsulated packets we set \field{hash_report_tunnel} to 0.



then? Another comment is that we keep repeating GRE/VXLAN/GENEVE too
many times. Let's add a paragraph defining a concept e.g. a "tunnel" or
"tunneled packets", explaining how they are handled at a high level,
and then just refer to the tunnel everywhere.
Let's also add external references to specifications documenting the
relevant tunnel types.


Ok, I'll try to make this clear.

Thanks.





--
2.19.1.6.gb485710b



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [PATCH] virtio_net: support low and high rate of notification coalescing sets

2022-12-21 Thread Heng Qi




在 2022/12/21 下午7:48, Alvaro Karsz 写道:

Hi,


I want to know which one is better than NetDim(Coalesce Adaptive) in driver.

I know Heng Qi's work in this.

Thanks


Why choose? we can have both.
ethtool can handle both pkt_rate_low/pkt_rate_high and
use_adaptive_rx_coalesce/use_adaptive_tx_coalesce.

The adaptive algorithm can even use this feature to set low and high
coalescing sets.


Hi, all.


NetDIM is currently a mature library in the kernel. It uses the number 
of bytes, PPS and interrupt rate


as samples to make an action, and it is performed independently in tx or 
rx direction. Also, There will


be an extra worker to help us send the configuration based on the 
control queue to avoid interrupting
the softirq. Although the method of pkt_rate_{low, high} params does not 
conflict with dim, I seem


to have some doubts that the colasce parameters of the rx and tx 
directions are determined at the


same time based on pkt_rate alone, will this be a problem?


Thanks.



Alvaro



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash

2023-01-09 Thread Heng Qi
On Tue, Jan 10, 2023 at 12:57:38AM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 10, 2023 at 12:25:02AM -0500, Michael S. Tsirkin wrote:
> > > This will give extra pressure on the management stack, e.g it requires
> > > the device to have an out of spec way for introspection.
> > > 
> > > Thanks
> > 
> > As I tried to explain this is already the case. Feature bits do not
> > describe device capabilities fully, some of them are in config space.
> 
> To be precise, this does not necessarily require introspection, but
> it does require management control over config space
> such as supported hash types just like it has control over feature bits.
> E.g. QEMU currently seems to hard-code these to
> #define VIRTIO_NET_RSS_SUPPORTED_HASHES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \
>  VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \
>  VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \
>  VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \
>  VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \
>  VIRTIO_NET_RSS_HASH_TYPE_UDPv6 | \
>  VIRTIO_NET_RSS_HASH_TYPE_IP_EX | \
>  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
>  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)
> 
> but there's no reason not to give management control over these.

Yes, QEMU has requirements for live migration: the PCI config space will be
checked in get_pci_config_device(), and if src and dst are inconsistent, it
will prompt that the live migration failed. 
In fact, this is also done within our group. Live migration requires that
the two VMs have the same rss configuration, otherwise the migration will fail.

Therefore, it seems that we can regularize the description of 
VIRTIO_NET_F_HASH_TUNNEL into
"[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash for 
tunnel-encapsulated packets.",
and use different hash_types to help the migration determine whether it can 
succeed.

Thanks.

> 
> -- 
> MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash

2023-01-09 Thread Heng Qi




在 2023/1/9 下午7:39, Michael S. Tsirkin 写道:

Btw this "are defined below" all over the place is just contributing
to making the spec unnecesarily verbose. Simple "are:" will do.


Sure. I'll fix it in the next version.

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v7] virtio-net: support inner header hash

2023-01-10 Thread Heng Qi




在 2023/1/10 下午3:26, Heng Qi 写道:

On Tue, Jan 10, 2023 at 12:57:38AM -0500, Michael S. Tsirkin wrote:

On Tue, Jan 10, 2023 at 12:25:02AM -0500, Michael S. Tsirkin wrote:

This will give extra pressure on the management stack, e.g it requires
the device to have an out of spec way for introspection.

Thanks

As I tried to explain this is already the case. Feature bits do not
describe device capabilities fully, some of them are in config space.

To be precise, this does not necessarily require introspection, but
it does require management control over config space
such as supported hash types just like it has control over feature bits.
E.g. QEMU currently seems to hard-code these to
#define VIRTIO_NET_RSS_SUPPORTED_HASHES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \
  VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \
  VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \
  VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \
  VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \
  VIRTIO_NET_RSS_HASH_TYPE_UDPv6 | \
  VIRTIO_NET_RSS_HASH_TYPE_IP_EX | \
  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX | \
  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX)

but there's no reason not to give management control over these.

Yes, QEMU has requirements for live migration: the PCI config space will be
checked in get_pci_config_device(), and if src and dst are inconsistent, it
will prompt that the live migration failed.


To be clearer, I mean \filed{supported_hash_types} in structure 
virtio_net_config.


Thanks.


In fact, this is also done within our group. Live migration requires that
the two VMs have the same rss configuration, otherwise the migration will fail.

Therefore, it seems that we can regularize the description of 
VIRTIO_NET_F_HASH_TUNNEL into
"[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner header hash for 
tunnel-encapsulated packets.",
and use different hash_types to help the migration determine whether it can 
succeed.

Thanks.


--
MST

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH v7] virtio-net: support inner header hash

2023-01-30 Thread Heng Qi
On Wed, Jan 18, 2023 at 11:45:39PM +, Parav Pandit wrote:
> 
> 
> > From: virtio-dev@lists.oasis-open.org  
> > Sent: Wednesday, January 4, 2023 2:14 AM
> 
> > If the tunnel is used to encapsulate the packets, the hash calculated using 
> > the
> > outer header of the receive packets is always fixed for the same flow 
> > packets,
> > i.e. they will be steered to the same receive queue.
> 
> > +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner
> > +header hash for GRE, VXLAN and GENEVE tunnel-encapsulated packets.
> > +
> A device may not support all 3 at the same time.
> Please remove mentioning tunneling protocols description from here.
> Just say device support inner header hash ...

Sorry for the late reply due to vacation.

Good idea, Michael suggested doing the same. But we also discussed this issue:
Early, we used a feature bit to force devices to support GRE and VXLAN (see
https://lists.oasis-open.org/archives/virtio-dev/202211/msg00183.html ).
Later Jason suggested to use VIRTIO_NET_F_HASH_TUNNEL instead of 
VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER (see
https://lists.oasis-open.org/archives/virtio-dev/202212/msg00014.html ). Now 
Michael proposes to remove this list (see
https://lists.oasis-open.org/archives/virtio-dev/202301/msg00079.html ), 
because the migration is based on the feature
bit and hash types to determine whether the live migration is successful.
> 
> An additional bit map somewhere else should say supported hash over different 
> tunneling types.
> 

Yes, we use \field{supported_hash_types} to declare supported hash types.

> [...]
> 
> > +The device calculates the hash on the inner IPv4 packet of an
> > +encapsulated packet according to 'Enabled hash types' bitmask as follows:
> > +\begin{itemize}
> > +  \item If VIRTIO_NET_HASH_TYPE_TCPv4 is set and the encapsulated packet
> > has an inner
> > +   TCPv4 header, the hash is calculated over the following fields:
> > +\begin{itemsize}
> > +  \item inner Source IP address
> > +  \item inner Destination IP address
> > +  \item inner Source TCP port
> > +  \item inner Destination TCP port
> > +\end{itemsize}
> > +  \item Else if VIRTIO_NET_HASH_TYPE_UDPv4 is set and the encapsulated
> > packet has an
> > +   inner UDPv4 header, the hash is calculated over the following fields:
> > +\begin{itemsize}
> > +  \item inner Source IP address
> > +  \item inner Destination IP address
> > +  \item inner Source UDP port
> > +  \item inner Destination UDP port
> > +\end{itemize}
> > +  \item Else if VIRTIO_NET_HASH_TYPE_IPv4 is set, the hash is calculated 
> > over
> > the
> > +   following fields:
> > +\begin{itemsize}
> > +  \item inner Source IP address
> > +  \item inner Destination IP address
> > +\end{itemsize}
> > +  \item Else the device does not calculate the hash \end{itemize}
> > +
> > +The device calculates the hash on the inner IPv6 packet without an
> > +extension header of an encapsulated packet according to 'Enabled hash 
> > types'
> > bitmask as follows:
> > +\begin{itemize}
> > +  \item If VIRTIO_NET_HASH_TYPE_TCPv6 is set and the encapsulated packet
> > has an inner
> > +   TCPv6 header, the hash is calculated over the following fields:
> > +\begin{itemsize}
> > +  \item inner Source IPv6 address
> > +  \item inner Destination IPv6 address
> > +  \item inner Source TCP port
> > +  \item inner Destination TCP port
> > +\end{itemsize}
> > +  \item Else if VIRTIO_NET_HASH_TYPE_UDPv6 is set and the encapsulated
> > packet has an
> > +   inner UDPv6 header, the hash is calculated over the following fields:
> > +\begin{itemsize}
> > +  \item inner Source IPv6 address
> > +  \item inner Destination IPv6 address
> > +  \item inner Source UDP port
> > +  \item inner Destination UDP port
> > +\end{itemize}
> > +  \item Else if VIRTIO_NET_HASH_TYPE_IPv6 is set, the hash is calculated 
> > over
> > the
> > +   following fields:
> > +\begin{itemsize}
> > +  \item inner Source IPv6 address
> > +  \item inner Destination IPv6 address
> > +\end{itemsize}
> > +  \item Else the device does not calculate the hash \end{itemize}
> > +
> > +The device calculates the hash on the inner IPv6 packet with an
> > +extension header of an encapsulated packet according to 'Enabled hash 
> > types'
> > bitmask as follows:
> > +\begin{itemsize}
> > +  \item If VIRTIO_NET_HASH_TYPE_TCP_EX is set and the encapsulated packet
> > has an inner
> > +   TCPv6 header, the hash is calculated over the following fields:
> > +\begin{itemize}
> > +  \item Home address from the home address option in the inner IPv6
> > destination
> > +  options header. If the inner extension header is not 
> > present, use the
> > +  inner Source IPv6 address.
> > +  \item I

[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH v7] virtio-net: support inner header hash

2023-02-01 Thread Heng Qi




在 2023/2/2 上午11:55, Parav Pandit 写道:

From: virtio-comm...@lists.oasis-open.org  On Behalf Of Michael S. Tsirkin
Sent: Wednesday, February 1, 2023 1:57 AM

Also, this patch is adding two functionalities.
1. Inner header hash calculation of existing already defined hash
types 2. outer header hash for new type for GRE,VXLAN,GENEVE.
#1 should be in 1st patch.
#2 should be in 2nd patch.
This is better to review.

Parav, you come to this discussion pretty late. Asking to split up the patch 
when
it's v1/v2 is ok. Asking after others have already reviewed v6 is not you are
making review easier for yourself but re-review harder for others who already
have a mind map of the patch.

In this case unless we really want to enable these separately (and frankly I 
don't
see a good reason to) then splitting it up makes review more confusing.


No. There is no need to enable it separately.
It was hard to parse new inner type decoding addition which has close to zero 
relation to outer headers.

As you say, it has some history, I don't have strong opinion to split.
But going forward in subsequent work, it is better to see logical changes in 
multiple patches.


It seems that we don't need to emphasize the outer header hash, which is the
same behavior as usual when the inner header hash is not raised.

Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] RE: [virtio-dev] [PATCH] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature

2023-02-06 Thread Heng Qi




在 2023/2/7 上午5:53, Parav Pandit 写道:



From: virtio-dev@lists.oasis-open.org  On
Behalf Of Michael S. Tsirkin

On Mon, Feb 06, 2023 at 07:13:43PM +, Parav Pandit wrote:



From: virtio-dev@lists.oasis-open.org
 On Behalf Of Alvaro Karsz

This patch makes several improvements to the notification coalescing
feature,
including:

- Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx
   into a single struct, virtio_net_ctrl_coal, as they are identical.
- Emphasizing that the coalescing commands are best-effort.
- Defining the behavior of coalescing with regards to delivering
   notifications when a change occur.


Patch needs to do one thing at a time.
Please split above into three patches.


Signed-off-by: Alvaro Karsz 
---
  device-types/net/description.tex | 40
++--
  1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/device-types/net/description.tex
b/device-types/net/description.tex
index 1741c79..2a98411 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1514,15 +1514,12 @@ \subsubsection{Control
Virtqueue}\label{sec:Device Types / Network Device / Devi  If the
VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can  send
control commands for dynamically changing the coalescing parameters.

-\begin{lstlisting}
-struct virtio_net_ctrl_coal_rx {
-le32 rx_max_packets;
-le32 rx_usecs;
-};
+Note: In general, these commands are best-effort: A device could
+send a
notification even if it is not supposed to.


Please remove this note from this patch.
Instead of Note, we need to describe this device requirements description.
We better need to describe the motivation for it.
We may want to say there may be jitter in notification, but device should not

be sending when it is not supposed to.

It's explicitly allowed:

split-ring.tex:The driver MUST handle spurious notifications from the device.
split-ring.tex:The device MUST handle spurious notifications from the driver.


The intent is the guide to device implementation to have less spurious 
interrupts.
Best effort wording says that device is free to implement timer of any 
granularity of choice, which kind of defeats the purpose of IM.

So, both can handle/generate spurious notifications, but it shouldn't be best 
line guidance.




I also have more description to add in this area with regards to GSO and LRO.
I make humble suggestion that we draft is jointly in separate patch combining

these clarifications.

-struct virtio_net_ctrl_coal_tx {
-le32 tx_max_packets;
-le32 tx_usecs;
+\begin{lstlisting}
+struct virtio_net_ctrl_coal {
+le32 max_packets;
+le32 usecs;
  };


This is one good change to go as separate patch.


  #define VIRTIO_NET_CTRL_NOTF_COAL 6 @@ -1532,25 +1529,25 @@
\subsubsection{Control Virtqueue}\label{sec:Device Types / Network
Device / Devi

  Coalescing parameters:
  \begin{itemize}
-\item \field{rx_usecs}: Maximum number of usecs to delay a RX

notification.

-\item \field{tx_usecs}: Maximum number of usecs to delay a TX

notification.

-\item \field{rx_max_packets}: Maximum number of packets to receive
before a RX notification.
-\item \field{tx_max_packets}: Maximum number of packets to send
before a TX notification.
+\item \field{usecs} for RX: Maximum number of usecs to delay a RX
notification.
+\item \field{usecs} for TX: Maximum number of usecs to delay a TX
notification.
+\item \field{max_packets} for RX: Maximum number of packets to
+receive
before a RX notification.
+\item \field{max_packets} for TX: Maximum number of packets to send
+before
a TX notification.
  \end{itemize}


s/for Rx/For receive virtqueue
s/for Tx/For transmit virtqueue

Which virtqueue? It says TX/RX pretty consistently in this text.
Changing to receive virtqueue/transmit virtqueue would be a big change and
frankly for a very modest gain in readability.
Rather maybe just say RX/TX where we describe virtqueue.


We are describing rest of the previous sections as transmitq, receiveq.
Would like to keep this section consistent with rest of the Network device 
section, and not just notification coalescing section.


  The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
  \begin{enumerate}
-\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs}
and \field{tx_max_packets} parameters.
-\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs}
and \field{rx_max_packets} parameters.
+\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{usecs} and
\field{max_packets} parameters for TX.
+\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{usecs} and
\field{max_packets} parameters for RX.
  \end{enumerate}

  \subparagraph{RX Notifications}\label{sec:Device Types / Network
Device / Device Operation / Control Virtqueue / Notifications
Coalescing / RX Notifications}

  If, for example:
  \begin{itemize}
-\item \field{rx_usecs} = 10.
-\item \field{rx_max_packets} = 15.
+\item \field{usecs} = 10.
+\item \field{max_packets} = 15.
  

Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/8 下午6:10, Michael S. Tsirkin 写道:

On Wed, Feb 08, 2023 at 09:57:54AM +0800, Heng Qi wrote:

I think it's a good idea to do this on top of Alvaro's patch
unifying these two structures.

I saw Alvaro's patch, but it doesn't seem to be stable yet, is there a good
way for me to
unify the two structures, since a patch should only do one thing.

Problem is you were trying to change these existing structures too
so the patches conflicted. However I think at this point
we are in agreement on a new command with a new structure.
In this case there won't be a conflict as you are
not touching existing commands so no need for you
to depend on Alvaro's patch.


I get it.

Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/9 上午1:53, Alvaro Karsz 写道:

  > > From: Michael S. Tsirkin 

Sent: Wednesday, February 8, 2023 9:48 AM

On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:43 AM

On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:18 AM

On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote:

I see two options.
1. Just have per VQ params. Software has the full knowledge
of in which it is

operating, and state remains at software level.

This effectively achieves both the mode.

2. Have a mode cmd,
Mode = (a) per device or (b) per VQ (c) disable After the
mode is set, driver can set per device or per VQ.

I find this more clear.

Thanks.


Rereading this I think I misunderstood the proposal.
Now we are burning memory on maintaining mode, and this
information is duplicated.


It is not maintained in the pci resident memory, so it doesn't hurt.


I'd say let's just add a new command COAL_QUEUE_SET with vqn as

parameter.

Existing commands are simply defined as a shortcut to running
COAL_QUEUE_SET on all tx/rx queues respectively.

Latest command dictates the parameters. To disable just set
everything to 0 (btw we should make this explicit in the spec,
but it can be

guessed from:

Upon reset, a device MUST initialize all coalescing parameters to 0.
)


Switching between the modes (per q vs per device) implicitly is
ambiguous

and it only means device may need to iterate.

hmm i feel it's only ambiguous because i failed to explain in well.


This state is either better maintained in sw by always having per
vq or have

clearly defined mode of what device should do.

Per Q is very common even for several years old devices.
Last time I counted, there were at least 15 such devices supporting it.

So actual usage wise, I practically see that most implementations
will end up

with per vq mode.

I like to hear from Heng or Alvaro if they see any use of per device.


Right so given this, most devices will be in per queue mode all the
time. why do you want a mode then? just keep per queue.
existing commands are kept around for compat but internally just
translate to per-queue.

Since the space is not released, do we need to keep the compat?

It's been accepted for half a year so we can't say for sure no one built this.

That is likely but we should have the ability to have the Errata/ECN to correct 
it, specially for unrelease spec.


The way I propose is just a bit of firmware on device that scans all queues and
copies same parameters everywhere.

This scanning loop in sw appears cheaper to me than some embedded fw.
But is not a lot of concern.


Seems easier than worrying about this,
and we get disabling coalescing for free which you wanted. With an extra mode
its extra logic in the device fast path. Maybe it's cheap on hardware side but 
in
software it's an extra branch, not free.

Most performant data path wouldn't implement and read the extra mode.
It is always fw that is going to program same value, or per queue valued or 
disable value in each Q regardless whichever way we craft the CVQ cmd.

The sequence that bothers me is below.
1. driver set global params
2. few minutes later, now driver set param for Q=1

On this command, a device need to decide:
Should Q = 2 to N
(a) either work with previous globals, or
(b) because per Q was set for one queue, they rest of the queues implicitly 
disable it.

If it is (b),
When a command on Q object =1 is issued, it affects other Q objects. <- This I 
want to avoid.
A cmd that modifies the object, should only modify that object.

If it is (a), it is mixed mode operation, which is ambiguous definition.

A better semantic is to define such change at device level and no extra cost in 
the data path.

I think that (a) is the way to go.
I don't think that we should work with operation modes at all.


I agree to keep the current global settings (VIRTIO_NET_F_NOTF_COAL and 
its corresponding commands),
because our hardware team has limited resources for the control queue, 
and they don't want to send a separate

cmd for each queue when send a global setting cmd.

Then adding a VIRTIO_NET_F_PERQUEUE_NOTF_COAL or 
VIRTIO_NET_F_VQ_NOTF_COAL feature bit and new commands for per-queue or 
VQ setting looks better to me.

In my opinion:

We should have 2 features:
VIRTIO_NET_F_PERQUEUE_NOTF_COAL and VIRTIO_NET_F_NOTF_COAL.

VIRTIO_NET_F_PERQUEUE_NOTF_COAL sets per queue parameters, and
VIRTIO_NET_F_NOTF_COAL sets parameters for all queues.

VIRTIO_NET_F_NOTF_COAL has 2 commands:
 VIRTIO_NET_CTRL_NOTF_COAL_RX_SET
 VIRTIO_NET_CTRL_NOTF_COAL_TX_SET

VIRTIO_NET_F_PERQUEUE_NOTF_COAL has 2 commands:
 VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_TX_SET
 VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_RX_SET

We can see VIRTIO_NET_CTRL_NOTF_COAL_RX_SET as a virtio level shortcut
for sett

Re: [virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/9 上午6:35, Alvaro Karsz 写道:

From: Alvaro Karsz 
Sent: Wednesday, February 8, 2023 4:56 PM

Alvaro,
Do you know if any software used it? Can you get some real data?

I implemented this feature in our DPU, so at least 1 vendor is using this 
feature

But which software (virtio net driver) in which OS is using this?

Sorry, I'm not sure I understand your question.

The feature is implemented in the linux kernel
https://github.com/torvalds/linux/commit/699b045a8e43bd1063db4795be685bfd659649dc
So we'll always have kernel versions accepting this feature, if offered.


(I will add support for the per vq command of course).
I really don't know about other vendors..

You are suggesting to reserve the command and feature bit for safety, so, if we
reserve them, why not just use them? What do we lose here?


If it is used by some unknown software, only that sw breaks by using non 
release spec.
If we use it by changing the definition, it may break that unknown sw.
If we know there is no known sw, we are better of with redefinition (by adding 
vqn, and by removing tx,rx from it).


Not having this feature/command even complicates things now that we are
talking about removing the RX and TX from the per vq command, how do you
change parameters to all TX queues? to all RX queues? we'll need 2 special
indexes, so we now need le32 to hold the queue index.

No need for special index.
How does a driver disable all queues or reset all queues? -> One by one.
So if user want to change for all TXQ, sw can do it one by one by iterating TXQ 
vqns.

Yes, but resetting the queues doesn't require a control command.
If a server has 64K queues, and a user wants to set all coalescing
parameters to X (maybe with ethtool), it will generate 64K control
commands..


At least our hardware design doesn't expect that.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/9 上午4:52, Michael S. Tsirkin 写道:

On Wed, Feb 08, 2023 at 07:53:09PM +0200, Alvaro Karsz wrote:

  > > From: Michael S. Tsirkin 

Sent: Wednesday, February 8, 2023 9:48 AM

On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:43 AM

On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:18 AM

On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote:

I see two options.
1. Just have per VQ params. Software has the full knowledge
of in which it is

operating, and state remains at software level.

This effectively achieves both the mode.

2. Have a mode cmd,
Mode = (a) per device or (b) per VQ (c) disable After the
mode is set, driver can set per device or per VQ.

I find this more clear.

Thanks.


Rereading this I think I misunderstood the proposal.
Now we are burning memory on maintaining mode, and this
information is duplicated.


It is not maintained in the pci resident memory, so it doesn't hurt.


I'd say let's just add a new command COAL_QUEUE_SET with vqn as

parameter.

Existing commands are simply defined as a shortcut to running
COAL_QUEUE_SET on all tx/rx queues respectively.

Latest command dictates the parameters. To disable just set
everything to 0 (btw we should make this explicit in the spec,
but it can be

guessed from:

Upon reset, a device MUST initialize all coalescing parameters to 0.
)


Switching between the modes (per q vs per device) implicitly is
ambiguous

and it only means device may need to iterate.

hmm i feel it's only ambiguous because i failed to explain in well.


This state is either better maintained in sw by always having per
vq or have

clearly defined mode of what device should do.

Per Q is very common even for several years old devices.
Last time I counted, there were at least 15 such devices supporting it.

So actual usage wise, I practically see that most implementations
will end up

with per vq mode.

I like to hear from Heng or Alvaro if they see any use of per device.


Right so given this, most devices will be in per queue mode all the
time. why do you want a mode then? just keep per queue.
existing commands are kept around for compat but internally just
translate to per-queue.

Since the space is not released, do we need to keep the compat?

It's been accepted for half a year so we can't say for sure no one built this.

That is likely but we should have the ability to have the Errata/ECN to correct 
it, specially for unrelease spec.


The way I propose is just a bit of firmware on device that scans all queues and
copies same parameters everywhere.

This scanning loop in sw appears cheaper to me than some embedded fw.
But is not a lot of concern.


Seems easier than worrying about this,
and we get disabling coalescing for free which you wanted. With an extra mode
its extra logic in the device fast path. Maybe it's cheap on hardware side but 
in
software it's an extra branch, not free.

Most performant data path wouldn't implement and read the extra mode.
It is always fw that is going to program same value, or per queue valued or 
disable value in each Q regardless whichever way we craft the CVQ cmd.

The sequence that bothers me is below.
1. driver set global params
2. few minutes later, now driver set param for Q=1

On this command, a device need to decide:
Should Q = 2 to N
(a) either work with previous globals, or
(b) because per Q was set for one queue, they rest of the queues implicitly 
disable it.

If it is (b),
When a command on Q object =1 is issued, it affects other Q objects. <- This I 
want to avoid.
A cmd that modifies the object, should only modify that object.

If it is (a), it is mixed mode operation, which is ambiguous definition.

A better semantic is to define such change at device level and no extra cost in 
the data path.

I think that (a) is the way to go.
I don't think that we should work with operation modes at all.

In my opinion:

We should have 2 features:
VIRTIO_NET_F_PERQUEUE_NOTF_COAL and VIRTIO_NET_F_NOTF_COAL.

VIRTIO_NET_F_PERQUEUE_NOTF_COAL sets per queue parameters, and
VIRTIO_NET_F_NOTF_COAL sets parameters for all queues.

VIRTIO_NET_F_NOTF_COAL has 2 commands:
 VIRTIO_NET_CTRL_NOTF_COAL_RX_SET
 VIRTIO_NET_CTRL_NOTF_COAL_TX_SET

VIRTIO_NET_F_PERQUEUE_NOTF_COAL has 2 commands:
 VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_TX_SET
 VIRTIO_NET_CTRL_NOTF_COAL_PER_QUEUE_RX_SET

We can see VIRTIO_NET_CTRL_NOTF_COAL_RX_SET as a virtio level shortcut
for setting all queues with one command, exactly as intended with
rx_qid= 0x, and without breaking devices following the current
spec.

The device's FW can decide if it stores parameters received with
VIRTIO_NET_CTRL_NOTF_COAL_RX_SET in a global set, or if it iterates
through all queues, but IMO the best way it to iterate through all
queues.

Seems like

Re: [virtio-dev] RE: [virtio-comment] [PATCH] virtio-net: support per-queue coalescing moderation

2023-02-08 Thread Heng Qi




在 2023/2/8 下午11:04, Parav Pandit 写道:



From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:48 AM

On Wed, Feb 08, 2023 at 02:44:37PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:43 AM

On Wed, Feb 08, 2023 at 02:37:55PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Wednesday, February 8, 2023 9:18 AM

On Wed, Feb 08, 2023 at 07:30:34PM +0800, Heng Qi wrote:

I see two options.
1. Just have per VQ params. Software has the full knowledge
of in which it is

operating, and state remains at software level.

This effectively achieves both the mode.

2. Have a mode cmd,
Mode = (a) per device or (b) per VQ (c) disable After the
mode is set, driver can set per device or per VQ.

I find this more clear.

Thanks.


Rereading this I think I misunderstood the proposal.
Now we are burning memory on maintaining mode, and this
information is duplicated.


It is not maintained in the pci resident memory, so it doesn't hurt.


I'd say let's just add a new command COAL_QUEUE_SET with vqn as

parameter.

Existing commands are simply defined as a shortcut to running
COAL_QUEUE_SET on all tx/rx queues respectively.

Latest command dictates the parameters. To disable just set
everything to 0 (btw we should make this explicit in the spec,
but it can be

guessed from:

Upon reset, a device MUST initialize all coalescing parameters to 0.
)


Switching between the modes (per q vs per device) implicitly is
ambiguous

and it only means device may need to iterate.

hmm i feel it's only ambiguous because i failed to explain in well.


This state is either better maintained in sw by always having per
vq or have

clearly defined mode of what device should do.

Per Q is very common even for several years old devices.
Last time I counted, there were at least 15 such devices supporting it.

So actual usage wise, I practically see that most implementations
will end up

with per vq mode.

I like to hear from Heng or Alvaro if they see any use of per device.


Right so given this, most devices will be in per queue mode all the
time. why do you want a mode then? just keep per queue.
existing commands are kept around for compat but internally just
translate to per-queue.

Since the space is not released, do we need to keep the compat?

It's been accepted for half a year so we can't say for sure no one built this.

That is likely but we should have the ability to have the Errata/ECN to correct 
it, specially for unrelease spec.


The way I propose is just a bit of firmware on device that scans all queues and
copies same parameters everywhere.

This scanning loop in sw appears cheaper to me than some embedded fw.
But is not a lot of concern.


Seems easier than worrying about this,
and we get disabling coalescing for free which you wanted. With an extra mode
its extra logic in the device fast path. Maybe it's cheap on hardware side but 
in
software it's an extra branch, not free.

Most performant data path wouldn't implement and read the extra mode.
It is always fw that is going to program same value, or per queue valued or 
disable value in each Q regardless whichever way we craft the CVQ cmd.

The sequence that bothers me is below.
1. driver set global params
2. few minutes later, now driver set param for Q=1

On this command, a device need to decide:
Should Q = 2 to N
(a) either work with previous globals, or
(b) because per Q was set for one queue, they rest of the queues implicitly 
disable it.

If it is (b),
When a command on Q object =1 is issued, it affects other Q objects. <- This I 
want to avoid.
A cmd that modifies the object, should only modify that object.

If it is (a), it is mixed mode operation, which is ambiguous definition.


I think it should be a. I think we should blur the concept of mode. 
There seems to be no mode here.
From the perspective of the device, it only needs to distinguish 
commands and do what it should do.


Thanks.



A better semantic is to define such change at device level and no extra cost in 
the data path.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-10 Thread Heng Qi




在 2023/2/10 下午6:16, Alvaro Karsz 写道:

So, should we remove VIRTIO_NET_F_CTRL_VQ here, or fix VIRTIO_NET_F_HOST_ECN?

Ah good point.
But I think  VIRTIO_NET_F_VQ_NOTF_COAL should not depend on 
VIRTIO_NET_F_NOTF_COAL.
This way devices can drop the all-rx/all-tx commands if they want to.

We need to confirm this. If we make VIRTIO_NET_F_VQ_NOTF_COAL
independent of VIRTIO_NET_F_NOTF_COAL,
do we need to give vqn a special value so that the driver can also have
the fast path of sending all queues with global settings via
VIRTIO_NET_F_VQ_NOTF_COAL?

IMO we don't need a special vqn value.
A device that can modify all the vqs should offer VIRTIO_NET_F_NOTF_COAL.


That's clear, thanks for the quick reply.
Have a great weekend!


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-11 Thread Heng Qi




在 2023/2/11 下午4:45, Alvaro Karsz 写道:

Please add short description something like,

When the driver prefers to use per virtqueue notifications coalescing, and if 
queue group (transmit or receive) level notification coalescing is enabled, 
driver SHOULD first disable device level notification coalescing.
Or it should be,


I disagree here.
IMO "queue group level notification coalescing" is not something to
enable or disable, but a shortcut to set all TX/RX queues at once.
Why should the spec force a driver to "disable device level
notification coalescing" (I assume you mean send a
VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)?
What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET
command, and then a single queue traffic increases? why should it zero
the parameters to all other queues?


Hi, Alvaro! Thanks for your reply!

I think Parav refers more to the scene where ethool sets parameters at 
the queue group level and
the scene where netdim sets parameters for a single queue. In this 
scenario, netdim should really
determine the coalescing parameters of the device, and the parameters at 
the queue group level set
by ethtool should be ignored (many drivers are designed this way, such 
as mlx), that is, we need to give netdim the right,
because it It has the ability to dynamically adjust parameters. 
(However, I think this friendly constraint is also possible in driver 
implementation.)


Of course, if we consider setting coalescing parameters at the queue 
group level and single queue level separately through ethtool,

then as you said, we should not set any priority for them.

Back to reality, I think the function of ethtool to set single queue 
parameters may come later, which is thankless for users because of netdim.


Therefore, if our specification tends to be practical, we can add 
Parav's proposal, and if our specification tends to be more general, 
then hand over

the constraints to the driver implementation. what do you think?


I think that this should be discussed in the driver implementation
stage, not in the spec.


Virtqueue level notifications coalescing, and device level notifications can be 
enabled together.
When both of them are enabled, per virtqueue notifications coalescing take 
priority over queue group level.

How do you enable  Virtqueue level notifications coalescing? Why are
they different entities?
I don't think that we should have priorities, but the last command
should be the one that dictates the coalescing parameters.

For example, let's look at vq0 (RX):
Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change
the parameters accordingly (all RX vqs should do the same).
Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0,
vq0 changes the parameters accordingly (all RX vqs are still using the
"old" parameters)
Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 changes the
parameters accordingly (all RX vqs should do the same).


Yes I see what you mean, thanks for the clear example. This should be 
the second scenario I described above,

let's discuss how to solve it in the above reply.

Thanks! :)



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-12 Thread Heng Qi
On Sat, Feb 11, 2023 at 01:47:16PM +, Parav Pandit wrote:
> 
> 
> > From: Alvaro Karsz 
> > Sent: Saturday, February 11, 2023 3:45 AM
> > 
> > > Please add short description something like,
> > >
> > > When the driver prefers to use per virtqueue notifications coalescing, 
> > > and if
> > queue group (transmit or receive) level notification coalescing is enabled, 
> > driver
> > SHOULD first disable device level notification coalescing.
> > > Or it should be,
> > >
> > 
> > I disagree here.
> > IMO "queue group level notification coalescing" is not something to enable 
> > or
> > disable, but a shortcut to set all TX/RX queues at once.
> That short cut is the enable/disablement.
> 
> > Why should the spec force a driver to "disable device level notification
> > coalescing" (I assume you mean send a
> > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)?
> Yes. Because to have well defined behavior when sw configured both one after 
> the another.
> 
> > What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET
> > command, and then a single queue traffic increases? why should it zero the
> > parameters to all other queues?
> That is short transition when driver is switching over to per queue mode.
> This is fine to have short glitch.
> 
> > I think that this should be discussed in the driver implementation stage, 
> > not in
> > the spec.
> > 
> There should be a clear guidance on how device should behave when both per q 
> and per device are configured.
> 
> > > Virtqueue level notifications coalescing, and device level notifications 
> > > can be
> > enabled together.
> > > When both of them are enabled, per virtqueue notifications coalescing take
> > priority over queue group level.
> > 
> > How do you enable  Virtqueue level notifications coalescing? Why are they
> > different entities?
> Using the new command that has vqn in it.
> 
> > I don't think that we should have priorities, but the last command should 
> > be the
> > one that dictates the coalescing parameters.
> > 
> Priority is applicable when driver has issued both the commands. Per tx/rx, 
> and per vqn.
> 
> > For example, let's look at vq0 (RX):
> > Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change
> > the parameters accordingly (all RX vqs should do the same).
> > Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0,
> > vq0 changes the parameters accordingly (all RX vqs are still using the "old"
> > parameters) Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0
> > changes the parameters accordingly (all RX vqs should do the same).
> In this example, per VQ were overridden with per device.
> Yes, so the last one is applicable, so priority of last one applies.
> 
> We continue to refuse to add the mode, and hence need to supply these 
> description of both the sequence on how device should behave.
> 
> Sequence_1:
> 1. tx/rx group level
> 2. per vqn level
> When #2 is done, VQ's whose per vq level is configured, follows vqn, rest of 
> the VQs follow #1.
> 
> Sequence_2:
> 1. per vqn
> 2. tx/rx group level
> When #2 is done, group level overrides the per vqn parameters. 

Adding examples of the two command sequences is due, I will do so in the next 
release. :)

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-12 Thread Heng Qi
On Sat, Feb 11, 2023 at 06:13:55PM +0200, Alvaro Karsz wrote:
> I think that I wasn't clear enough.
> 
> I'm not saying that we should not define in the spec how to handle a
> situation when a device receives both  RX_SET and VQ_SET (or a driver
> sends both).
> I'm saying that I don't think that the driver should handle the
> situation the way you described it:
> 
> > When the driver prefers to use per virtqueue notifications coalescing, and 
> > if queue group (transmit or receive) level notification coalescing is 
> > enabled, driver SHOULD first disable device level notification coalescing.
> > Or it should be,
> >
> > Virtqueue level notifications coalescing, and device level notifications 
> > can be enabled together.
> > When both of them are enabled, per virtqueue notifications coalescing take 
> > priority over queue group level.
> 
> This implies that we have 2 modes and have priorities.
> 
> I think that if we want to refer to this situation in the spec, it
> should be something like:
> "A Device should use the last coalescing parameters received for a
> virtqueue, regardless of the command used to deliver the parameters."

Your suggestion is good, the per-device command and the per-queue command need 
some examples and behavior definitions,
I will add them to avoid some misunderstandings.

Thanks.

> (just an example to make the point).

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v2] virtio-net: support the virtqueue coalescing moderation

2023-02-12 Thread Heng Qi
On Sun, Feb 12, 2023 at 04:35:37AM -0500, Michael S. Tsirkin wrote:
> On Sat, Feb 11, 2023 at 01:47:16PM +, Parav Pandit wrote:
> > 
> > 
> > > From: Alvaro Karsz 
> > > Sent: Saturday, February 11, 2023 3:45 AM
> > > 
> > > > Please add short description something like,
> > > >
> > > > When the driver prefers to use per virtqueue notifications coalescing, 
> > > > and if
> > > queue group (transmit or receive) level notification coalescing is 
> > > enabled, driver
> > > SHOULD first disable device level notification coalescing.
> > > > Or it should be,
> > > >
> > > 
> > > I disagree here.
> > > IMO "queue group level notification coalescing" is not something to 
> > > enable or
> > > disable, but a shortcut to set all TX/RX queues at once.
> > That short cut is the enable/disablement.
> > 
> > > Why should the spec force a driver to "disable device level notification
> > > coalescing" (I assume you mean send a
> > > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET command with zeros)?
> > Yes. Because to have well defined behavior when sw configured both one 
> > after the another.
> > 
> > > What if the driver sends a VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET
> > > command, and then a single queue traffic increases? why should it zero the
> > > parameters to all other queues?
> > That is short transition when driver is switching over to per queue mode.
> > This is fine to have short glitch.
> > 
> > > I think that this should be discussed in the driver implementation stage, 
> > > not in
> > > the spec.
> > > 
> > There should be a clear guidance on how device should behave when both per 
> > q and per device are configured.
> > 
> > > > Virtqueue level notifications coalescing, and device level 
> > > > notifications can be
> > > enabled together.
> > > > When both of them are enabled, per virtqueue notifications coalescing 
> > > > take
> > > priority over queue group level.
> > > 
> > > How do you enable  Virtqueue level notifications coalescing? Why are they
> > > different entities?
> > Using the new command that has vqn in it.
> > 
> > > I don't think that we should have priorities, but the last command should 
> > > be the
> > > one that dictates the coalescing parameters.
> > > 
> > Priority is applicable when driver has issued both the commands. Per tx/rx, 
> > and per vqn.
> > 
> > > For example, let's look at vq0 (RX):
> > > Device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0 should change
> > > the parameters accordingly (all RX vqs should do the same).
> > > Then device receives VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET with vqn = 0,
> > > vq0 changes the parameters accordingly (all RX vqs are still using the 
> > > "old"
> > > parameters) Then device receives VIRTIO_NET_CTRL_NOTF_COAL_RX_SET, vq0
> > > changes the parameters accordingly (all RX vqs should do the same).
> > In this example, per VQ were overridden with per device.
> > Yes, so the last one is applicable, so priority of last one applies.
> > 
> > We continue to refuse to add the mode, and hence need to supply these 
> > description of both the sequence on how device should behave.
> > 
> > Sequence_1:
> > 1. tx/rx group level
> > 2. per vqn level
> > When #2 is done, VQ's whose per vq level is configured, follows vqn, rest 
> > of the VQs follow #1.
> > 
> > Sequence_2:
> > 1. per vqn
> > 2. tx/rx group level
> > When #2 is done, group level overrides the per vqn parameters. 
> 
> Since there's apparently some room for misunderstanding, I think adding these 
> examples can't hurt.

Ok, I'll handle this.

> I would also be more specific and just use specific numbers in the
> example, to avoid any ambiguity.

Agree.

Thanks.

> 
> -- 
> MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature

2023-02-16 Thread Heng Qi




在 2023/2/16 下午3:32, Alvaro Karsz 写道:

This patch makes several improvements to the notification coalescing
feature, including:

- Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx
   into a single struct, virtio_net_ctrl_coal, as they are identical.
- Emphasizing that the coalescing commands are best-effort.
- Defining the behavior of coalescing with regards to delivering
   notifications when a change occur.
- Stating that the commands should apply to all the receive/transmit
   virtqueues.
- Stating that every receive/transmit virtqueue should count it's own
   packets.
- A new intro explaining the entire coalescing operation.

Signed-off-by: Alvaro Karsz 
---
v2:
- Add the last 2 points to the patch.
- Rephrase the "commands are best-effort" note.
- Replace "notification" with "used buffer notification" to be
  more consistent.
v3:
- Add an intro explaining the entire coalescing operation.
v4:
- Minor wording fixes.
- Rephrase the general note.

  device-types/net/description.tex | 69 +++-
  1 file changed, 41 insertions(+), 28 deletions(-)

diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 1741c79..11760f3 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1514,15 +1514,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
  If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
  send control commands for dynamically changing the coalescing parameters.
  
-\begin{lstlisting}

-struct virtio_net_ctrl_coal_rx {
-le32 rx_max_packets;
-le32 rx_usecs;
-};
+\begin{note}
+The behavior of the device in response to these commands is best-effort:
+the device may generate notifications more or less frequently than specified.
+\end{note}
  
-struct virtio_net_ctrl_coal_tx {

-le32 tx_max_packets;
-le32 tx_usecs;
+\begin{lstlisting}
+struct virtio_net_ctrl_coal {
+le32 max_packets;
+le32 usecs;
  };
  
  #define VIRTIO_NET_CTRL_NOTF_COAL 6

@@ -1532,49 +1532,62 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
  
  Coalescing parameters:

  \begin{itemize}
-\item \field{rx_usecs}: Maximum number of usecs to delay a RX notification.
-\item \field{tx_usecs}: Maximum number of usecs to delay a TX notification.
-\item \field{rx_max_packets}: Maximum number of packets to receive before a RX 
notification.
-\item \field{tx_max_packets}: Maximum number of packets to send before a TX 
notification.
+\item \field{usecs} for RX: Maximum number of usecs to delay a RX notification.
+\item \field{usecs} for TX: Maximum number of usecs to delay a TX notification.
+\item \field{max_packets} for RX: Maximum number of packets to receive before 
a RX notification.
+\item \field{max_packets} for TX: Maximum number of packets to send before a 
TX notification.
  \end{itemize}
  
-

  The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
  \begin{enumerate}
-\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and 
\field{tx_max_packets} parameters.
-\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and 
\field{rx_max_packets} parameters.
+\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{usecs} and 
\field{max_packets} parameters for all the transmit virtqueues.
+\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{usecs} and 
\field{max_packets} parameters for all the receive virtqueues.
  \end{enumerate}
  
-\subparagraph{RX Notifications}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Notifications}

+\subparagraph{Operation}\label{sec:Device Types / Network Device / Device 
Operation / Control Virtqueue / Notifications Coalescing / Operation}
+
+The device sends a used buffer notification once the notification conditions 
are met, if the notifications are not suppressed as explained in \ref{sec:Basic 
Facilities of a Virtio Device / Virtqueues / Used Buffer Notification 
Suppression}.
+
+When the device has non-zero \field{usecs} and non-zero \field{max_packets}, 
it starts counting usecs and packets upon receiving/sending a packet.
+The device counts packets and usecs for each receive virtqueue and transmit 
virtqueue separately.
+In this case, the notification conditions are met when \field{usecs} usecs 
elapses, or upon sending/receiving \field{max_packets} packets, whichever 
happens first.
+


Hi, Alvaro.

"when \field{usecs} usecs elapses"  --> "when \field{usecs} elapses", right?

Thanks for your clear work.


+When the device has \field{usecs} = 0 or \field{max_packets} = 0, the 
notification conditions are met after every packet received/sent.
+
+\subparagraph{RX Example}\label{sec:Device Types / Network Device / Device 
Operation / Control Virtqueue / Notifications Coalescing / RX Example}
  
  If, for example:

  \begin{itemize}
-\item \field{rx

Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation

2023-02-17 Thread Heng Qi
On Fri, Feb 17, 2023 at 10:42:21AM +0200, Alvaro Karsz wrote:
> Hi Heng,
> 
> > +\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports the virtqueue
> > +notifications coalescing.
> > +
> >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> >
> >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit 
> > requirements}\label{sec:Device Types / Network Device
> >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or 
> > VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_VQ_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \end{description}
> >
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / 
> > Network Device / Feature bits / Legacy Interface: Feature bits}
> > @@ -4501,8 +4505,11 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
> > Types / Network Device / Devi
> >  };
> >
> >  #define VIRTIO_NET_CTRL_NOTF_COAL 6
> > - #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET  0
> > + #define VIRTIO_NET_CTRL_NOTF_COAL_TX_SET 0
> >   #define VIRTIO_NET_CTRL_NOTF_COAL_RX_SET 1
> > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET 2
> > + #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3
> > +
> >  \end{lstlisting}
> >
> >  Coalescing parameters:
> > @@ -4514,12 +4521,67 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
> > Types / Network Device / Devi
> >  \end{itemize}
> >
> >
> > -The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
> > +The class VIRTIO_NET_CTRL_NOTF_COAL has 4 commands:
> >  \begin{enumerate}
> >  \item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and 
> > \field{tx_max_packets} parameters.
> >  \item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and 
> > \field{rx_max_packets} parameters.
> > +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET: set the \field{max_packets} and 
> > \field{max_usecs} parameters for a enabled
> > +transmit/receive virtqueue whose 
> > number is \field{vqn}.
> > +\item VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET: the device gets the 
> > \field{max_packets} and \field{max_usecs} parameters of
> > +a enabled transmit/receive 
> > virtqueue whose number is \field{vqn}, and then
> > +responds them to the driver.
> >  \end{enumerate}
> >
> > +If the VIRTIO_NET_F_VQ_NOTF_COAL feature is negotiated:
> > +1. a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET command to set the 
> > coalescing
> > +   parameters of a enabled transmit/receive virtqueue.
> > +2. a driver can send a VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET command to a 
> > device, and the device
> > +   responds to the driver with the coalescing parameters of a enabled 
> > transmit/receive virtqueue.
> > +
> > +\begin{lstlisting}
> > +struct virtio_net_ctrl_coal_vq {
> > +le16 vqn;
> > +le16 reserved;
> > +le32 max_packets;
> > +le32 max_usecs;
> > +};
> > +\end{lstlisting}
> > +
> 
> Maybe we can use struct virtio_net_ctrl_coal inside struct
> virtio_net_ctrl_coal_vq instead of repeating max_usecs and
> max_packets?
> I'm not sure if it would be confusing, what do you think?
> 

Hi Alvaro.

I guess you mean one of the following two forms:

#1
struct virtio_net_ctrl_coal {
le32 max_packets;
le32 max_usecs;
};

struct virtio_net_ctrl_coal_vq {
le16 vqn;
le16 reserved;
struct virtio_net_ctrl_coal coal;
} coal_vq;

#2
struct virtio_net_ctrl_coal {
le32 max_packets;
le32 max_usecs;
le16 vqn; // if _F_VQ_NOTF_COAL is negotiated
le16 reserved; // if _F_VQ_NOTF_COAL is negotiated
};

If it's #1, I think the format is a bit ugly, it's not semantic to use coal_vq 
to send global commands when _F_VQ_NOTF_COAL is not negotiated, and the 
presence of vqn and reserved is awkward.
If it's #2, I think this is a bit like the v1 version, using 
virtio_net_ctrl_coal as a virtual queue to send commands does not seem to be 
semantic, but it is indeed more unified in function.

I think we should hear from Michael and Parav.

> > +Virtqueue coalescing parameters:
> > +\begin{itemize}
> > +\item \field{vqn}: The virtqueue number of the enabled transmit or receive 
> > virtqueue, excluding the control virtqueue.
> > +\item \field{max_packets}: The maximum number of packets sent/received by 
> > the specified virtqueue before a TX/RX notification.
> > +\item \field{max_usecs}: The maximum number of TX/RX usecs that the 
> > specified virtqueue delays a TX/RX notification.
> > +\end{itemize}
> > +
> > +\field{reserved} is reserved and it is ignored by the device.
> > +
> 
> max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with
> VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET.
> ("Maximum number of packets to receive/send before a RX/TX notification").
> The fact that this is applied to all VQs or to a specific one 

Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation

2023-02-17 Thread Heng Qi
On Fri, Feb 17, 2023 at 11:40:15AM +0200, Alvaro Karsz wrote:
> > > Maybe we can use struct virtio_net_ctrl_coal inside struct
> > > virtio_net_ctrl_coal_vq instead of repeating max_usecs and
> > > max_packets?
> > > I'm not sure if it would be confusing, what do you think?
> > >
> >
> > Hi Alvaro.
> >
> > I guess you mean one of the following two forms:
> >
> > #1
> > struct virtio_net_ctrl_coal {
> > le32 max_packets;
> > le32 max_usecs;
> > };
> >
> > struct virtio_net_ctrl_coal_vq {
> > le16 vqn;
> > le16 reserved;
> > struct virtio_net_ctrl_coal coal;
> > } coal_vq;
> >
> > #2
> > struct virtio_net_ctrl_coal {
> > le32 max_packets;
> > le32 max_usecs;
> > le16 vqn; // if _F_VQ_NOTF_COAL is negotiated
> > le16 reserved; // if _F_VQ_NOTF_COAL is negotiated
> > };
> >
> > If it's #1, I think the format is a bit ugly, it's not semantic to use 
> > coal_vq to send global commands when _F_VQ_NOTF_COAL is not negotiated, and 
> > the presence of vqn and reserved is awkward.
> > If it's #2, I think this is a bit like the v1 version, using 
> > virtio_net_ctrl_coal as a virtual queue to send commands does not seem to 
> > be semantic, but it is indeed more unified in function.
> >
> > I think we should hear from Michael and Parav.
> >
> 
> I meant #1.
> We can see virtio_net_ctrl_coal as a struct holding coalescing
> parameters, regardless of the commands.
> Yes, let's wait for more comments on that.
> 
> > > > +Virtqueue coalescing parameters:
> > > > +\begin{itemize}
> > > > +\item \field{vqn}: The virtqueue number of the enabled transmit or 
> > > > receive virtqueue, excluding the control virtqueue.
> > > > +\item \field{max_packets}: The maximum number of packets sent/received 
> > > > by the specified virtqueue before a TX/RX notification.
> > > > +\item \field{max_usecs}: The maximum number of TX/RX usecs that the 
> > > > specified virtqueue delays a TX/RX notification.
> > > > +\end{itemize}
> > > > +
> > > > +\field{reserved} is reserved and it is ignored by the device.
> > > > +
> > >
> > > max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with
> > > VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET.
> > > ("Maximum number of packets to receive/send before a RX/TX notification").
> > > The fact that this is applied to all VQs or to a specific one is
> > > derived from the command.
> > > Same for max_usecs.
> > > Maybe we can join the coalescing parameters somehow instead of
> > > repeating the explanations?
> > >
> 
> Any thoughts on this part?

Good idea, and if so, is there a good way to expose vqn to the interpretation 
of max_packets ?

#1
\item \field{vqn}: The virtqueue number of the enabled transmit or receive 
virtqueue.
\item \field{max_packets}: The maximum number of packets sent/received by the 
specified virtqueue before a TX/RX notification.

#2
\item \field{max_packets}: Maximum number of packets to receive/send before a 
RX/TX notification.

Thanks.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v8] virtio-net: support inner header hash

2023-02-17 Thread Heng Qi



在 2023/2/18 上午12:24, Parav Pandit 写道:

From: virtio-dev@lists.oasis-open.org  On
Behalf Of Heng Qi

[..]


We assume that hash_report_tunnel_types is still present in the next version,

I am little lost.


Hi, Parav.

You are not lost. I'm just answering some of Michael's questions and 
making assumptions. :)



I thought we all agreed that reporting just the tunnel type in data path path 
virtio_net_hdr is not useful to sw.
And hence we should omit in the virtio_net_hdr.

Did I miss the motivation to add it back?
If not, probably it is better to review and discuss v9 without it, which will 
be easier to discuss.


but it only exists in virtio net hdr and should be populated by the device after
the hash calculation. hash_tunnel_types already controls whether the device
computes internal header hashes.


I don't really follow this, hash_report_tunnel_type is better off
keeping it "report" literally.


Talking about VIRTIO_NET_F_HASH_TUNNEL here. Not
hash_report_tunnel_type.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] [PATCH v3] virtio-net: support the virtqueue coalescing moderation

2023-02-17 Thread Heng Qi



在 2023/2/17 下午6:07, Michael S. Tsirkin 写道:

On Fri, Feb 17, 2023 at 11:40:15AM +0200, Alvaro Karsz wrote:

Maybe we can use struct virtio_net_ctrl_coal inside struct
virtio_net_ctrl_coal_vq instead of repeating max_usecs and
max_packets?
I'm not sure if it would be confusing, what do you think?


Hi Alvaro.

I guess you mean one of the following two forms:

#1
struct virtio_net_ctrl_coal {
 le32 max_packets;
 le32 max_usecs;
};

struct virtio_net_ctrl_coal_vq {
 le16 vqn;
 le16 reserved;
 struct virtio_net_ctrl_coal coal;
} coal_vq;

#2
struct virtio_net_ctrl_coal {
 le32 max_packets;
 le32 max_usecs;
 le16 vqn; // if _F_VQ_NOTF_COAL is negotiated
 le16 reserved; // if _F_VQ_NOTF_COAL is negotiated
};

If it's #1, I think the format is a bit ugly, it's not semantic to use coal_vq 
to send global commands when _F_VQ_NOTF_COAL is not negotiated, and the 
presence of vqn and reserved is awkward.
If it's #2, I think this is a bit like the v1 version, using 
virtio_net_ctrl_coal as a virtual queue to send commands does not seem to be 
semantic, but it is indeed more unified in function.

I think we should hear from Michael and Parav.


I meant #1.
We can see virtio_net_ctrl_coal as a struct holding coalescing
parameters, regardless of the commands.
Yes, let's wait for more comments on that.

Reusing virtio_net_ctrl_coal is a nice thought. Makes it a bit clearer
these have exactly the same role.
Whether to put vqn first or last does not matter imho.


+Virtqueue coalescing parameters:
+\begin{itemize}
+\item \field{vqn}: The virtqueue number of the enabled transmit or receive 
virtqueue, excluding the control virtqueue.
+\item \field{max_packets}: The maximum number of packets sent/received by the 
specified virtqueue before a TX/RX notification.
+\item \field{max_usecs}: The maximum number of TX/RX usecs that the specified 
virtqueue delays a TX/RX notification.
+\end{itemize}
+
+\field{reserved} is reserved and it is ignored by the device.
+

max_packets is the same with VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET and with
VIRTIO_NET_CTRL_NOTF_COAL_[T/R]X_SET.
("Maximum number of packets to receive/send before a RX/TX notification").
The fact that this is applied to all VQs or to a specific one is
derived from the command.
Same for max_usecs.
Maybe we can join the coalescing parameters somehow instead of
repeating the explanations?


Any thoughts on this part?

I think I agree. Generally I think we should first of all describe the
new VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, moving all explanation text
to that.

Then just explain that VIRTIO_NET_CTRL_NOTF_COAL_TX_SET and
VIRTIO_NET_CTRL_NOTF_COAL_RX_SET have the same effect
as VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET repeated for
all currently enabled tx/rx vqs.
Plus maybe a single annotated example where there's a mix of
VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET, VIRTIO_NET_CTRL_NOTF_COAL_RX_SET and
VIRTIO_NET_CTRL_NOTF_COAL_TX_SET commands. For example with 2 vq pairs:

1. VIRTIO_NET_CTRL_NOTF_COAL_RX_SET sets for vq 0 and 2, vq 1 and 3 retain 
reset value
2. VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET sets vq 0, vq 2 retains value from 1
3. VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET sets vq 1, vq 3 retains reset value
4. VIRTIO_NET_CTRL_NOTF_COAL_TX_SET overrides command 3

no need for many examples.



Good idea. This is a clear and comprehensive example.

Thanks.






-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v3] virtio-net: support the virtqueue coalescing moderation

2023-02-17 Thread Heng Qi
On Fri, Feb 17, 2023 at 04:12:34PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Friday, February 17, 2023 6:35 AM
> 
> > > We mention the device reset case, but nothing about VQ reset.
> > >
> > > I feel that no matter how we handle this, we break something.
> > >
> > > Having default coalescing values may collide with "Upon reset, a
> > > device MUST initialize all coalescing parameters to 0."
> > 
> > No this is after device reset.
> > 
> > > We can say that VQ reset doesn't affect the "global parameters" and a
> > > device reset does, but this collides with "Device Requirements:
> > > Virtqueue Reset".
> > >
> Not really.
> When the device resets, VQ objects are destroyed in the device.
> So VQ's notifications parameters doesn't exists on device reset.
> 
> And so the same case with VQ reset.
> When a VQ is reset (disabled), VQ's notifications configuration is removed in 
> the device too.
> Just like its desc ring and other addresses are invalid.

Yes, but there seems to be such a situation: when the device is reactivated, as 
the specification says,
all parameters are set to 0 (use parameters as the default configuration on the 
device).
When CTRL_COAL_SET and CTRL_COAL_VQ_SET are sent, the configuration is updated 
(the parameters of each vq may be different,
but the global parameter configuration may be recorded), at this time, if vq is 
reset,
should the parameters be 0 or a recorded global parameters after it is 
re-enabled?

> 
> > > In fact, resetting coalescing values after vq reset may be derived
> > > from "Upon reset, a device MUST initialize all coalescing parameters
> > > to 0".
> > > This is consistent with "Device Requirements: Virtqueue Reset".
> > >
> > > I can add in my patch some clarifications.
> > >
> > > This will break the linux virtio_net ethtool implementation a little,
> > > we'll need to fix it.
> > 
> > Not good. I feel we must come up with spec that is backwards compatible.
> > Hmm, maybe this is why Parav kept talking about modes.
> > I did not realize at the time, sorry Parav.
> > 
> > I still feel modes are not the best way to describe things so I propose 
> > this:
> > - in addition to per vq parameters, device that supports global TX/RX
> >   commands and ring reset maintains two sets of default parameters: for RX 
> > and
> > TX
> > - existing commands change default and change all enabled vqs
> >   of the correct type (RX/TX) to the same value
> > - vq reset changes a vq to the default
> > - device reset changes defaults to 0 and changes all vqs to  0
> > 
> > note how defaults are only used for ring reset.  is "vq reset parameter"
> > a better name? I feel we will repeat "reset" too many times in a sentence 
> > if we
> > call it that though.
> > 
> > So fundamentally the only change is with RING_RESET, then default is not
> > always 0, it can be set by the global command.
> 
> True default is not zero when global are configured.
> It is ok to report VQ parameters on GET command when globals are configured.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [virtio-comment] [PATCH v6] virtio-net: Fix and update VIRTIO_NET_F_NOTF_COAL feature

2023-02-19 Thread Heng Qi

Hi, Alvaro.
Thanks for your work !
As suggested earlier in the 'virtqueue coalescing' thread, I will be 
working on top of your patch. :)



在 2023/2/19 下午5:03, Alvaro Karsz 写道:

This patch makes several improvements to the notification coalescing
feature, including:

- Consolidating virtio_net_ctrl_coal_tx and virtio_net_ctrl_coal_rx
   into a single struct, virtio_net_ctrl_coal, as they are identical.
- Emphasizing that the coalescing commands are best-effort.
- Defining the behavior of coalescing with regards to delivering
   notifications when a change occur.
- Stating that the commands should apply to all the receive/transmit
   virtqueues.
- Stating that every receive/transmit virtqueue should count it's own
   packets.
- A new intro explaining the entire coalescing operation.

Signed-off-by: Alvaro Karsz 
Reviewed-by: Parav Pandit 
---
v2:
- Add the last 2 points to the patch.
- Rephrase the "commands are best-effort" note.
- Replace "notification" with "used buffer notification" to be
  more consistent.
v3:
- Add an intro explaining the entire coalescing operation.
v4:
- Minor wording fixes.
- Rephrase the general note.
v5:
- Replace virtio_net_ctrl_coal->usecs with
  virtio_net_ctrl_coal->max_usecs
v6:
- Replace usecs with microseconds.
- Make the TX/RX examples relevant for MQ as well.
- Explain that upon meeting the notification condition, the
  device starts counting packets and microseconds again.
- Add Parav's Reviewed-by tag (Thanks!)

  device-types/net/description.tex | 74 +++-
  1 file changed, 44 insertions(+), 30 deletions(-)

diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 1741c79..c7a8ca6 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -1514,15 +1514,15 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
  If the VIRTIO_NET_F_NOTF_COAL feature is negotiated, the driver can
  send control commands for dynamically changing the coalescing parameters.
  
-\begin{lstlisting}

-struct virtio_net_ctrl_coal_rx {
-le32 rx_max_packets;
-le32 rx_usecs;
-};
+\begin{note}
+The behavior of the device in response to these commands is best-effort:
+the device may generate notifications more or less frequently than specified.
+\end{note}
  
-struct virtio_net_ctrl_coal_tx {

-le32 tx_max_packets;
-le32 tx_usecs;
+\begin{lstlisting}
+struct virtio_net_ctrl_coal {
+le32 max_packets;
+le32 max_usecs;
  };
  
  #define VIRTIO_NET_CTRL_NOTF_COAL 6

@@ -1532,49 +1532,63 @@ \subsubsection{Control Virtqueue}\label{sec:Device 
Types / Network Device / Devi
  
  Coalescing parameters:

  \begin{itemize}
-\item \field{rx_usecs}: Maximum number of usecs to delay a RX notification.
-\item \field{tx_usecs}: Maximum number of usecs to delay a TX notification.
-\item \field{rx_max_packets}: Maximum number of packets to receive before a RX 
notification.
-\item \field{tx_max_packets}: Maximum number of packets to send before a TX 
notification.
+\item \field{max_usecs} for RX: Maximum number of microseconds to delay a RX 
notification.
+\item \field{max_usecs} for TX: Maximum number of microseconds to delay a TX 
notification.
+\item \field{max_packets} for RX: Maximum number of packets to receive before 
a RX notification.
+\item \field{max_packets} for TX: Maximum number of packets to send before a 
TX notification.
  \end{itemize}
  
-

  The class VIRTIO_NET_CTRL_NOTF_COAL has 2 commands:
  \begin{enumerate}
-\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{tx_usecs} and 
\field{tx_max_packets} parameters.
-\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{rx_usecs} and 
\field{rx_max_packets} parameters.
+\item VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: set the \field{max_usecs} and 
\field{max_packets} parameters for all the transmit virtqueues.
+\item VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: set the \field{max_usecs} and 
\field{max_packets} parameters for all the receive virtqueues.
  \end{enumerate}
  
-\subparagraph{RX Notifications}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing / RX Notifications}

+\subparagraph{Operation}\label{sec:Device Types / Network Device / Device 
Operation / Control Virtqueue / Notifications Coalescing / Operation}
+
+The device sends a used buffer notification once the notification conditions 
are met, if the notifications are not suppressed as explained in \ref{sec:Basic 
Facilities of a Virtio Device / Virtqueues / Used Buffer Notification 
Suppression}.
+
+When the device has non-zero \field{max_usecs} and non-zero 
\field{max_packets}, it starts counting microseconds and packets upon 
receiving/sending a packet.
+The device counts packets and microseconds for each receive virtqueue and 
transmit virtqueue separately.
+In this case, the notification c

  1   2   >