subject:"\[virtio\-dev\] RE\: \[PATCH v9\] virtio\-net\: support inner header hash"

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-04-08 Thread Michael S. Tsirkin

On Thu, Mar 30, 2023 at 08:37:21PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/21 上午3:48, Michael S. Tsirkin 写道:
> > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > > We use the most basic GRE header fields (not NVGRE), not even optional
> > > fields.
> > I'd say yes, the most convincing usecase is with legacy GRE.
> 
> Yes. But we still have a strong need for VXLAN and GENEVE to do symmetric
> hashing. Please consider this.

Using a specific key seems fragile though in that a different one is
needed for e.g. ipv4 and ipv6.  An issue with VXLAN and GENEVE, yes?
Will support for XOR hashing address this sufficiently or is that not
acceptable to you? Or alternatively a modified Toeplitz, e.g. this
https://inbox.dpdk.org/dev/20190731123040.gg4...@6wind.com/
suggests Mellanox supports that. WDYT?

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-30 Thread Heng Qi





在 2023/3/21 上午3:48, Michael S. Tsirkin 写道:

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:

We use the most basic GRE header fields (not NVGRE), not even optional
fields.

I'd say yes, the most convincing usecase is with legacy GRE.


Yes. But we still have a strong need for VXLAN and GENEVE to do 
symmetric hashing. Please consider this.



Given that, do you need the rest of protocols there?


I would say that I checked the current tunneling protocols used for 
overlay networks and their respective RFC versions compared to each other.


They are:

1. GRE_rfc2784 :This protocol is only specified for IPv4 and used as 
either the payload or delivery protocol.

    link : https://datatracker.ietf.org/doc/rfc2784/

2. GRE_rfc2890: This protocol describes extensions by which two fields, 
Key and Sequence Number, can be optionally carried in the GRE Header.

    link: https://www.rfc-editor.org/rfc/rfc2890

3. GRE_rfc7676: IPv6 Support for Generic Routing Encapsulation (GRE). 
This protocol is specified for IPv6 and used as either the payload or 
delivery protocol.
    Note that this does not change the GRE header format or any 
behaviors specified by RFC 2784 or RFC 2890.

    link: https://datatracker.ietf.org/doc/rfc7676/

4. GRE-in-UDP: GRE-in-UDP Encapsulation. This specifies a method of 
encapsulating network protocol packets within GRE and UDP headers.
    This GRE-in-UDP encapsulation allows the UDP source port field to 
be used as an entropy field. This protocol is specified for IPv4 and 
IPv6, and used as either the payload or delivery protocol.

    link: https://www.rfc-editor.org/rfc/rfc8086

5. VXLAN: Virtual eXtensible Local Area Network.
    link: https://datatracker.ietf.org/doc/rfc7348/

6. VXLAN-GPE: Generic Protocol Extension for VXLAN. This protocol 
describes extending Virtual eXtensible Local Area Network (VXLAN) via 
changes to the VXLAN header.

    link: https://www.ietf.org/archive/id/draft-ietf-nvo3-vxlan-gpe-12.txt

7. GENEVE: Generic Network Virtualization Encapsulation.
    link: https://datatracker.ietf.org/doc/rfc8926/

8. IPIP: IP Encapsulation within IP.
    link: https://www.rfc-editor.org/rfc/rfc2003

9. NVGRE: Network Virtualization Using Generic Routing Encapsulation
    link: https://www.rfc-editor.org/rfc/rfc7637.html

10. STT: Stateless Transport Tunneling. STT is particularly useful when 
some tunnel endpoints are in end-systems, as it utilizes the 
capabilities of the network interface card to improve performance.

  link: https://www.ietf.org/archive/id/draft-davie-stt-08.txt

Among them, GRE_rfc2784, VXLAN and GENEVE are our internal requirements 
for inner header hashing.

GRE_rfc2784 requires RSS hashing to different queues.
For the monitoring scenario I mentioned, VXLAN or GRE_rfc2890 also needs 
to use inner symmetric hashing.


I know you mean to want this feature to only support GRE_rfc2784, since 
it's the most convincing for RSS.

But RSS hashes packets to different queues for different streams.
For the same flow, it needs to hash it to the same queue.
So this doesn't distort the role of RSS, and I believe that for modern 
protocols like VXLAN and others, inner symmetric hashing is still a 
common requirement for other vendors using virtio devices.


So, can we make this feature support all the protocols I have checked 
above, so that vendors can choose to support the protocols they want. 
And this can avoid the addition of new tunnel protocols

in the near future as much as possible.

Do you think it's ok?

Again: I'm so sorry I didn't realize I missed this until I checked my 
emails. 



We can start with just legacy GRE (think about including IPv6 or not).
Given how narrow this usecase is, I'd be fine with focusing
just on this, and addressing more protocols down the road
with something programmable like BPF. WDYT?




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-30 Thread Heng Qi





在 2023/3/21 上午3:45, Michael S. Tsirkin 写道:

On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote:

On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:


在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:

On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:


在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:

在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?

Let's simplify some details and take a fresh look at two different
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).

1. In Scenario1, we can improve the processing performance of the same flow
by implementing inner symmetric hashing.

This is because even though client1 and client2 communicate bidirectionally
through the same flow, their data may pass

through and be encapsulated by different tunnels, resulting in the same flow
being hashed to different queues and processed by different CPUs.

To ensure consistency and optimized processing, we need to parse out the
inner header and compute a symmetric hash on it using a special rss key.

Sorry for not mentioning the inner symmetric hash before, in order to
prevent the introduction of more concepts, but it is indeed a kind of inner
hash.

If parts of a flow go through different tunnels won't this cause
reordering at the network level? Why is it so important to prevent it at
the nic then?  Or, since you are stressing symmetric hash, are you
talking about TX and RX side going through different tunnels?

Yes, the directions client1->client2 and client2->client1 may go through
different tunnels.
Using inner symmetric hashing can satisfy the same CPU to process two
directions of the same flow to improve performance.

Well sure but ... are you just doing forwarding or inner processing too?

When there is an inner hash, there is no forwarding anymore.


If forwarding why do you care about matching TX and RX queues? If e2e

In fact, we are just matching on the same rx queue. The network topology
is roughly as follows. The processing host will receive the packets
sent from client1 and client2 respectively, then make some action judgments,
and return them to client2 and client1 respectively.

client1   client2
| |
|  __ |
+->| tunnel |<+
   ||
  |  |
  |  |
  |  |
  v  v
+-+
| processing host |
+-+

Thanks.

monotoring host would be a better term


Sure.

I'm so sorry I didn't realize I missed this until I checked my emails.  :(





processing can't you just store the incoming hash in the flow and reuse
on TX? This is what Linux is doing...






2. In Scenario2 with GRE, the lack of outer transport headers means that
flows between multiple communication pairs encapsulated by the same tunnel

will all be hashed to the same queue. To address this, we need to implement
inner hashing to improve the performance of RSS. By parsing and calculating

the inner hash, different flows can be hashed to different queues.

Thanks.



Well 2 is at least inexact, there's flowID there. It's just 8 bit

We use the most basic GRE header fields (not NVGRE), not even optional
fields.
There is also no flow id in the GRE header, should you be referring to
NVGRE?

Thanks.


so not sufficient if there are more than 512 queues. Still 512 queues
is quite a lot. Are you trying to solve for

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-20 Thread Michael S. Tsirkin

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> We use the most basic GRE header fields (not NVGRE), not even optional
> fields.

I'd say yes, the most convincing usecase is with legacy GRE.
Given that, do you need the rest of protocols there?
We can start with just legacy GRE (think about including IPv6 or not).
Given how narrow this usecase is, I'd be fine with focusing
just on this, and addressing more protocols down the road
with something programmable like BPF. WDYT?

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-20 Thread Michael S. Tsirkin

On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote:
> On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > > 
> > > 
> > > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > > > 
> > > > > 
> > > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > > > If the tunnel is used to encapsulate the packets, the 
> > > > > > > > > > > hash calculated
> > > > > > > > > > > using the outer header of the receive packets is always 
> > > > > > > > > > > fixed for the
> > > > > > > > > > > same flow packets, i.e. they will be steered to the same 
> > > > > > > > > > > receive queue.
> > > > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > > > Yes, you are right. That's what we did before the inner 
> > > > > > > > > header hash, but it
> > > > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > > > 
> > > > > > > > > > For example geneve spec says:
> > > > > > > > > > 
> > > > > > > > > >it is necessary for entropy from encapsulated 
> > > > > > > > > > packets to be
> > > > > > > > > >exposed in the tunnel header.  The most common 
> > > > > > > > > > technique for this is
> > > > > > > > > >to use the UDP source port
> > > > > > > > > The end point of the tunnel called the gateway (with DPDK on 
> > > > > > > > > top of it).
> > > > > > > > > 
> > > > > > > > > 1. When there is no inner header hash, entropy can be 
> > > > > > > > > inserted into the udp
> > > > > > > > > src port of the outer header of the tunnel,
> > > > > > > > > and then the tunnel packet is handed over to the host. The 
> > > > > > > > > host needs to
> > > > > > > > > take out a part of the CPUs to parse the outer headers (but 
> > > > > > > > > not drop them)
> > > > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > > > and then use the inner
> > > > > > > > > hash to forward them to another part of the CPUs that are 
> > > > > > > > > responsible for
> > > > > > > > > processing.
> > > > > > > > I don't get this part. Leave inner hashes to the guest inside 
> > > > > > > > the
> > > > > > > > tunnel, why is your host doing this?
> > > > > 
> > > > > Let's simplify some details and take a fresh look at two different
> > > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > > > 
> > > > > 1. In Scenario1, we can improve the processing performance of the 
> > > > > same flow
> > > > > by implementing inner symmetric hashing.
> > > > > 
> > > > > This is because even though client1 and client2 communicate 
> > > > > bidirectionally
> > > > > through the same flow, their data may pass
> > > > > 
> > > > > through and be encapsulated by different tunnels, resulting in the 
> > > > > same flow
> > > > > being hashed to different queues and processed by different CPUs.
> > > > > 
> > > > > To ensure consistency and optimized processing, we need to parse out 
> > > > > the
> > > > > inner header and compute a symmetric hash on it using a special rss 
> > > > > key.
> > > > > 
> > > > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > > > prevent the introduction of more concepts, but it is indeed a kind of 
> > > > > inner
> > > > > hash.
> > > > If parts of a flow go through different tunnels won't this cause
> > > > reordering at the network level? Why is it so important to prevent it at
> > > > the nic then?  Or, since you are stressing symmetric hash, are you
> > > > talking about TX and RX side going through different tunnels?
> > > 
> > > Yes, the directions client1->client2 and client2->client1 may go through
> > > different tunnels.
> > > Using inner symmetric hashing can satisfy the same CPU to process two
> > > directions of the same flow to improve performance.
> > 
> > Well sure but ... are you just doing forwarding or inner processing too?
> 
> When there is an inner hash, there is no forwarding anymore.
> 
> > If forwarding why do you care about matching TX and RX queues? If e2e
> 
> In fact, we are just matching on the same rx queue. The network topology
> is roughly as follows. The processing host will receive the packets
> sent from client1 and client2 respectively, then make some action judgments,
> and return them to client2 and client1 respectively.
> 
> client1   client2
>| |
>|  __ |
>+->| tunnel |<+
>   ||
>

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-16 Thread Heng Qi

On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:
> On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > 
> > 
> > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > > 
> > > > 
> > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash 
> > > > > > > > > > calculated
> > > > > > > > > > using the outer header of the receive packets is always 
> > > > > > > > > > fixed for the
> > > > > > > > > > same flow packets, i.e. they will be steered to the same 
> > > > > > > > > > receive queue.
> > > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > > Yes, you are right. That's what we did before the inner header 
> > > > > > > > hash, but it
> > > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > > 
> > > > > > > > > For example geneve spec says:
> > > > > > > > > 
> > > > > > > > >it is necessary for entropy from encapsulated packets 
> > > > > > > > > to be
> > > > > > > > >exposed in the tunnel header.  The most common 
> > > > > > > > > technique for this is
> > > > > > > > >to use the UDP source port
> > > > > > > > The end point of the tunnel called the gateway (with DPDK on 
> > > > > > > > top of it).
> > > > > > > > 
> > > > > > > > 1. When there is no inner header hash, entropy can be inserted 
> > > > > > > > into the udp
> > > > > > > > src port of the outer header of the tunnel,
> > > > > > > > and then the tunnel packet is handed over to the host. The host 
> > > > > > > > needs to
> > > > > > > > take out a part of the CPUs to parse the outer headers (but not 
> > > > > > > > drop them)
> > > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > > and then use the inner
> > > > > > > > hash to forward them to another part of the CPUs that are 
> > > > > > > > responsible for
> > > > > > > > processing.
> > > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > > tunnel, why is your host doing this?
> > > > 
> > > > Let's simplify some details and take a fresh look at two different
> > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > > 
> > > > 1. In Scenario1, we can improve the processing performance of the same 
> > > > flow
> > > > by implementing inner symmetric hashing.
> > > > 
> > > > This is because even though client1 and client2 communicate 
> > > > bidirectionally
> > > > through the same flow, their data may pass
> > > > 
> > > > through and be encapsulated by different tunnels, resulting in the same 
> > > > flow
> > > > being hashed to different queues and processed by different CPUs.
> > > > 
> > > > To ensure consistency and optimized processing, we need to parse out the
> > > > inner header and compute a symmetric hash on it using a special rss key.
> > > > 
> > > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > > prevent the introduction of more concepts, but it is indeed a kind of 
> > > > inner
> > > > hash.
> > > If parts of a flow go through different tunnels won't this cause
> > > reordering at the network level? Why is it so important to prevent it at
> > > the nic then?  Or, since you are stressing symmetric hash, are you
> > > talking about TX and RX side going through different tunnels?
> > 
> > Yes, the directions client1->client2 and client2->client1 may go through
> > different tunnels.
> > Using inner symmetric hashing can satisfy the same CPU to process two
> > directions of the same flow to improve performance.
> 
> Well sure but ... are you just doing forwarding or inner processing too?

When there is an inner hash, there is no forwarding anymore.

> If forwarding why do you care about matching TX and RX queues? If e2e

In fact, we are just matching on the same rx queue. The network topology
is roughly as follows. The processing host will receive the packets
sent from client1 and client2 respectively, then make some action judgments,
and return them to client2 and client1 respectively.

client1   client2
   | |
   |  __ |
   +->| tunnel |<+
  ||
 |  |
 |  |
 |  |
 v  v
   +-+
   | processing host |
   +-+

Thanks.

> processing can't you just store the incoming hash in the flow and reuse
> on TX? This is what Linux is doing...
> 
> 
> 
> > > 
> >

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-15 Thread Michael S. Tsirkin

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > 
> > > 
> > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > If the tunnel is used to encapsulate the packets, the hash 
> > > > > > > > > calculated
> > > > > > > > > using the outer header of the receive packets is always fixed 
> > > > > > > > > for the
> > > > > > > > > same flow packets, i.e. they will be steered to the same 
> > > > > > > > > receive queue.
> > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > Yes, you are right. That's what we did before the inner header 
> > > > > > > hash, but it
> > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > 
> > > > > > > > For example geneve spec says:
> > > > > > > > 
> > > > > > > >it is necessary for entropy from encapsulated packets to 
> > > > > > > > be
> > > > > > > >exposed in the tunnel header.  The most common technique 
> > > > > > > > for this is
> > > > > > > >to use the UDP source port
> > > > > > > The end point of the tunnel called the gateway (with DPDK on top 
> > > > > > > of it).
> > > > > > > 
> > > > > > > 1. When there is no inner header hash, entropy can be inserted 
> > > > > > > into the udp
> > > > > > > src port of the outer header of the tunnel,
> > > > > > > and then the tunnel packet is handed over to the host. The host 
> > > > > > > needs to
> > > > > > > take out a part of the CPUs to parse the outer headers (but not 
> > > > > > > drop them)
> > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > and then use the inner
> > > > > > > hash to forward them to another part of the CPUs that are 
> > > > > > > responsible for
> > > > > > > processing.
> > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > tunnel, why is your host doing this?
> > > 
> > > Let's simplify some details and take a fresh look at two different
> > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > 
> > > 1. In Scenario1, we can improve the processing performance of the same 
> > > flow
> > > by implementing inner symmetric hashing.
> > > 
> > > This is because even though client1 and client2 communicate 
> > > bidirectionally
> > > through the same flow, their data may pass
> > > 
> > > through and be encapsulated by different tunnels, resulting in the same 
> > > flow
> > > being hashed to different queues and processed by different CPUs.
> > > 
> > > To ensure consistency and optimized processing, we need to parse out the
> > > inner header and compute a symmetric hash on it using a special rss key.
> > > 
> > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > prevent the introduction of more concepts, but it is indeed a kind of 
> > > inner
> > > hash.
> > If parts of a flow go through different tunnels won't this cause
> > reordering at the network level? Why is it so important to prevent it at
> > the nic then?  Or, since you are stressing symmetric hash, are you
> > talking about TX and RX side going through different tunnels?
> 
> Yes, the directions client1->client2 and client2->client1 may go through
> different tunnels.
> Using inner symmetric hashing can satisfy the same CPU to process two
> directions of the same flow to improve performance.

Well sure but ... are you just doing forwarding or inner processing too?
If forwarding why do you care about matching TX and RX queues? If e2e
processing can't you just store the incoming hash in the flow and reuse
on TX? This is what Linux is doing...



> > 
> > 
> > > 2. In Scenario2 with GRE, the lack of outer transport headers means that
> > > flows between multiple communication pairs encapsulated by the same tunnel
> > > 
> > > will all be hashed to the same queue. To address this, we need to 
> > > implement
> > > inner hashing to improve the performance of RSS. By parsing and 
> > > calculating
> > > 
> > > the inner hash, different flows can be hashed to different queues.
> > > 
> > > Thanks.
> > > 
> > > 
> > Well 2 is at least inexact, there's flowID there. It's just 8 bit
> 
> We use the most basic GRE header fields (not NVGRE), not even optional
> fields.
> There is also no flow id in the GRE header, should you be referring to
> NVGRE?
> 
> Thanks.
> 
> > so not sufficient if there are more than 512 queues. Still 512 queues
> > is quite a lot. Are you trying to solve for configurations with
> > more than 512 queues then?
> > 
>

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-15 Thread Heng Qi





在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:

On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:



在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:

在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

   it is necessary for entropy from encapsulated packets to be
   exposed in the tunnel header.  The most common technique for this is
   to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?


Let's simplify some details and take a fresh look at two different
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).

1. In Scenario1, we can improve the processing performance of the same flow
by implementing inner symmetric hashing.

This is because even though client1 and client2 communicate bidirectionally
through the same flow, their data may pass

through and be encapsulated by different tunnels, resulting in the same flow
being hashed to different queues and processed by different CPUs.

To ensure consistency and optimized processing, we need to parse out the
inner header and compute a symmetric hash on it using a special rss key.

Sorry for not mentioning the inner symmetric hash before, in order to
prevent the introduction of more concepts, but it is indeed a kind of inner
hash.

If parts of a flow go through different tunnels won't this cause
reordering at the network level? Why is it so important to prevent it at
the nic then?  Or, since you are stressing symmetric hash, are you
talking about TX and RX side going through different tunnels?


Yes, the directions client1->client2 and client2->client1 may go through 
different tunnels.
Using inner symmetric hashing can satisfy the same CPU to process two 
directions of the same flow to improve performance.






2. In Scenario2 with GRE, the lack of outer transport headers means that
flows between multiple communication pairs encapsulated by the same tunnel

will all be hashed to the same queue. To address this, we need to implement
inner hashing to improve the performance of RSS. By parsing and calculating

the inner hash, different flows can be hashed to different queues.

Thanks.



Well 2 is at least inexact, there's flowID there. It's just 8 bit


We use the most basic GRE header fields (not NVGRE), not even optional 
fields.
There is also no flow id in the GRE header, should you be referring to 
NVGRE?


Thanks.


so not sufficient if there are more than 512 queues. Still 512 queues
is quite a lot. Are you trying to solve for configurations with
more than 512 queues then?



Assuming that the same flow includes a unidirectional flow a->b, or a
bidirectional flow a->b and b->a,
such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another gateway
for some reason, resulting in different outer IP address,
      then this flow may be processed by different CPUs after reaching the
host if there is no inner hash. So after the host receives the
      flow, first use the forwarding CPUs to parse the inner hash, and then
use the hash to ensure that the flow is processed by the
      same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
go to gateway 2. In order to ensure that the same flow is
      processed by the same CPU, we still need the forwarding CPUs to parse
the real inner hash(here, the hash key needs to be replaced with a symmetric
hash key).

Oh intersting. What are those gateways, how come there's expectation
that you can change their addresses and topology
completely seamlessly without any reordering whatsoever?
Isn't network topology change kind of guaranteed to change ordering
sometimes?



1). During this process, the CPUs on the host is divided into two

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-15 Thread Michael S. Tsirkin

On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> 
> 
> 
> 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > 
> > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > If the tunnel is used to encapsulate the packets, the hash 
> > > > > > > calculated
> > > > > > > using the outer header of the receive packets is always fixed for 
> > > > > > > the
> > > > > > > same flow packets, i.e. they will be steered to the same receive 
> > > > > > > queue.
> > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > inner header hash in the outer source port to solve this?
> > > > > Yes, you are right. That's what we did before the inner header hash, 
> > > > > but it
> > > > > has a performance penalty, which I'll explain below.
> > > > > 
> > > > > > For example geneve spec says:
> > > > > > 
> > > > > >   it is necessary for entropy from encapsulated packets to be
> > > > > >   exposed in the tunnel header.  The most common technique for 
> > > > > > this is
> > > > > >   to use the UDP source port
> > > > > The end point of the tunnel called the gateway (with DPDK on top of 
> > > > > it).
> > > > > 
> > > > > 1. When there is no inner header hash, entropy can be inserted into 
> > > > > the udp
> > > > > src port of the outer header of the tunnel,
> > > > > and then the tunnel packet is handed over to the host. The host needs 
> > > > > to
> > > > > take out a part of the CPUs to parse the outer headers (but not drop 
> > > > > them)
> > > > > to calculate the inner hash for the inner payloads,
> > > > > and then use the inner
> > > > > hash to forward them to another part of the CPUs that are responsible 
> > > > > for
> > > > > processing.
> > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > tunnel, why is your host doing this?
> 
> 
> Let's simplify some details and take a fresh look at two different
> scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> 
> 1. In Scenario1, we can improve the processing performance of the same flow
> by implementing inner symmetric hashing.
> 
> This is because even though client1 and client2 communicate bidirectionally
> through the same flow, their data may pass
> 
> through and be encapsulated by different tunnels, resulting in the same flow
> being hashed to different queues and processed by different CPUs.
> 
> To ensure consistency and optimized processing, we need to parse out the
> inner header and compute a symmetric hash on it using a special rss key.
> 
> Sorry for not mentioning the inner symmetric hash before, in order to
> prevent the introduction of more concepts, but it is indeed a kind of inner
> hash.

If parts of a flow go through different tunnels won't this cause
reordering at the network level? Why is it so important to prevent it at
the nic then?  Or, since you are stressing symmetric hash, are you
talking about TX and RX side going through different tunnels?


> 2. In Scenario2 with GRE, the lack of outer transport headers means that
> flows between multiple communication pairs encapsulated by the same tunnel
> 
> will all be hashed to the same queue. To address this, we need to implement
> inner hashing to improve the performance of RSS. By parsing and calculating
> 
> the inner hash, different flows can be hashed to different queues.
> 
> Thanks.
> 
> 

Well 2 is at least inexact, there's flowID there. It's just 8 bit
so not sufficient if there are more than 512 queues. Still 512 queues
is quite a lot. Are you trying to solve for configurations with
more than 512 queues then?


> > > Assuming that the same flow includes a unidirectional flow a->b, or a
> > > bidirectional flow a->b and b->a,
> > > such flow may be out of order when processed by the gateway(DPDK):
> > > 
> > > 1. In unidirectional mode, if the same flow is switched to another gateway
> > > for some reason, resulting in different outer IP address,
> > >      then this flow may be processed by different CPUs after reaching the
> > > host if there is no inner hash. So after the host receives the
> > >      flow, first use the forwarding CPUs to parse the inner hash, and then
> > > use the hash to ensure that the flow is processed by the
> > >      same CPU.
> > > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> > > go to gateway 2. In order to ensure that the same flow is
> > >      processed by the same CPU, we still need the forwarding CPUs to parse
> > > the real inner hash(here, the hash key needs to be replaced with a 
> > > symmetric
> > > hash key).
> > Oh intersting. What are those gateways, how come there's expectation
> > that you can change their addresses and topology
> > completely seamlessly without any

Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-10 Thread Heng Qi






在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:


在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:

在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

  it is necessary for entropy from encapsulated packets to be
  exposed in the tunnel header.  The most common technique for this is
  to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?



Let's simplify some details and take a fresh look at two different 
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).


1. In Scenario1, we can improve the processing performance of the same 
flow by implementing inner symmetric hashing.


This is because even though client1 and client2 communicate 
bidirectionally through the same flow, their data may pass


through and be encapsulated by different tunnels, resulting in the same 
flow being hashed to different queues and processed by different CPUs.


To ensure consistency and optimized processing, we need to parse out the 
inner header and compute a symmetric hash on it using a special rss key.


Sorry for not mentioning the inner symmetric hash before, in order to 
prevent the introduction of more concepts, but it is indeed a kind of 
inner hash.


2. In Scenario2 with GRE, the lack of outer transport headers means that 
flows between multiple communication pairs encapsulated by the same tunnel


will all be hashed to the same queue. To address this, we need to 
implement inner hashing to improve the performance of RSS. By parsing 
and calculating


the inner hash, different flows can be hashed to different queues.

Thanks.




Assuming that the same flow includes a unidirectional flow a->b, or a
bidirectional flow a->b and b->a,
such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another gateway
for some reason, resulting in different outer IP address,
     then this flow may be processed by different CPUs after reaching the
host if there is no inner hash. So after the host receives the
     flow, first use the forwarding CPUs to parse the inner hash, and then
use the hash to ensure that the flow is processed by the
     same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
go to gateway 2. In order to ensure that the same flow is
     processed by the same CPU, we still need the forwarding CPUs to parse
the real inner hash(here, the hash key needs to be replaced with a symmetric
hash key).

Oh intersting. What are those gateways, how come there's expectation
that you can change their addresses and topology
completely seamlessly without any reordering whatsoever?
Isn't network topology change kind of guaranteed to change ordering
sometimes?



1). During this process, the CPUs on the host is divided into two parts, one
part is used as a forwarding node to parse the outer header,
   and the CPU utilization is low. Another part handles packets.

Some overhead is clearly involved in *sending* packets -
to calculate the hash and stick it in the port number.
This is, however, a separate problem and if you want to
solve it then my suggestion would be to teach the *transmit*
side about GRE offloads, so it can fill the source port in the card.


2). The entropy of the source udp src port is not enough, that is, the queue
is not widely distributed.

how isn't it enough? 16 bit is enough to cover all vqs ...

A 5-tuple brings more entropy than a single port, doesn't it?

But you don't need more for RSS, the indirection table is not
that large.


In fact, the
inner hash of the physical network card used by
the business team is indeed better than the udp port number of the outer
header we modify now, but they did not give me the data.

Admittedly, out hash value is 32 bit.


2. When

Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-09 Thread Michael S. Tsirkin

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > 
> > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > > > using the outer header of the receive packets is always fixed for the
> > > > > same flow packets, i.e. they will be steered to the same receive 
> > > > > queue.
> > > > Wait a second. How is this true? Does not everyone stick the
> > > > inner header hash in the outer source port to solve this?
> > > Yes, you are right. That's what we did before the inner header hash, but 
> > > it
> > > has a performance penalty, which I'll explain below.
> > > 
> > > > For example geneve spec says:
> > > > 
> > > >  it is necessary for entropy from encapsulated packets to be
> > > >  exposed in the tunnel header.  The most common technique for this 
> > > > is
> > > >  to use the UDP source port
> > > The end point of the tunnel called the gateway (with DPDK on top of it).
> > > 
> > > 1. When there is no inner header hash, entropy can be inserted into the 
> > > udp
> > > src port of the outer header of the tunnel,
> > > and then the tunnel packet is handed over to the host. The host needs to
> > > take out a part of the CPUs to parse the outer headers (but not drop them)
> > > to calculate the inner hash for the inner payloads,
> > > and then use the inner
> > > hash to forward them to another part of the CPUs that are responsible for
> > > processing.
> > I don't get this part. Leave inner hashes to the guest inside the
> > tunnel, why is your host doing this?
> 
> Assuming that the same flow includes a unidirectional flow a->b, or a
> bidirectional flow a->b and b->a,
> such flow may be out of order when processed by the gateway(DPDK):
> 
> 1. In unidirectional mode, if the same flow is switched to another gateway
> for some reason, resulting in different outer IP address,
>     then this flow may be processed by different CPUs after reaching the
> host if there is no inner hash. So after the host receives the
>     flow, first use the forwarding CPUs to parse the inner hash, and then
> use the hash to ensure that the flow is processed by the
>     same CPU.
> 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> go to gateway 2. In order to ensure that the same flow is
>     processed by the same CPU, we still need the forwarding CPUs to parse
> the real inner hash(here, the hash key needs to be replaced with a symmetric
> hash key).

Oh intersting. What are those gateways, how come there's expectation
that you can change their addresses and topology
completely seamlessly without any reordering whatsoever?
Isn't network topology change kind of guaranteed to change ordering
sometimes?


> > 
> > > 1). During this process, the CPUs on the host is divided into two parts, 
> > > one
> > > part is used as a forwarding node to parse the outer header,
> > >   and the CPU utilization is low. Another part handles packets.
> > Some overhead is clearly involved in *sending* packets -
> > to calculate the hash and stick it in the port number.
> > This is, however, a separate problem and if you want to
> > solve it then my suggestion would be to teach the *transmit*
> > side about GRE offloads, so it can fill the source port in the card.
> > 
> > > 2). The entropy of the source udp src port is not enough, that is, the 
> > > queue
> > > is not widely distributed.
> > how isn't it enough? 16 bit is enough to cover all vqs ...
> 
> A 5-tuple brings more entropy than a single port, doesn't it?

But you don't need more for RSS, the indirection table is not
that large.

> In fact, the
> inner hash of the physical network card used by
> the business team is indeed better than the udp port number of the outer
> header we modify now, but they did not give me the data.

Admittedly, out hash value is 32 bit.

> > > 2. When there is an inner header hash, the gateway will directly help 
> > > parse
> > > the outer header, and use the inner 5 tuples to calculate the inner hash.
> > > The tunneled packet is then handed over to the host.
> > > 1) All the CPUs of the host are used to process data packets, and there is
> > > no need to use some CPUs to forward and parse the outer header.
> > You really have to parse the outer header anyway,
> > otherwise there's no tunneling.
> > Unless you want to teach virtio to implement tunneling
> > in hardware, which is something I'd find it easier to
> > get behind.
> 
> There is no need to parse the outer header twice, because we use shared
> memory.

shared with what? you need the outer header to identify the tunnel.

> > > 2) The entropy of the original quintuple is sufficient, and the queue is
> > > widely distributed.
> > It's exactly the same entropy, why would it be better? In fact

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-09 Thread Heng Qi





在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?
For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?



Inner hash can at least hash tunnel flows without outer transport 
headers like GRE to multiple queues,

which is beneficial to us.

For tunnel flows with outer transport headers like VXLAN, although they 
can hash flows to different queues
by setting different outer udp port, this does not conflict with inner 
hash. Inner hashing can also be used for this purpose.


For the same flow, packets in the receiving and sending directions may 
pass through different tunnels respectively, which cause
the same flow to be hashed to different queues. In this case, we have to 
calculate a symmetric hash (can be called an inner symmetric hash, which 
is a type of inner hash.)
through the inner header, so that the same flow can be hashed to the 
same queue.


Symmetric hashing can ignore the order of the 5-tuples to calculate the 
hash, that is, the hash values calculated by (a1, a2, a3, a4) and (a2, 
a1, a4, a3) respectively are the same.


Thanks.



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-08 Thread Heng Qi





在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:


在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but it
has a performance penalty, which I'll explain below.


For example geneve spec says:

 it is necessary for entropy from encapsulated packets to be
 exposed in the tunnel header.  The most common technique for this is
 to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the udp
src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads,
and then use the inner
hash to forward them to another part of the CPUs that are responsible for
processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?


Assuming that the same flow includes a unidirectional flow a->b, or a 
bidirectional flow a->b and b->a,

such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another 
gateway for some reason, resulting in different outer IP address,
    then this flow may be processed by different CPUs after reaching 
the host if there is no inner hash. So after the host receives the
    flow, first use the forwarding CPUs to parse the inner hash, and 
then use the hash to ensure that the flow is processed by the

    same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow 
may go to gateway 2. In order to ensure that the same flow is
    processed by the same CPU, we still need the forwarding CPUs to 
parse the real inner hash(here, the hash key needs to be replaced with a 
symmetric hash key).





1). During this process, the CPUs on the host is divided into two parts, one
part is used as a forwarding node to parse the outer header,
  and the CPU utilization is low. Another part handles packets.

Some overhead is clearly involved in *sending* packets -
to calculate the hash and stick it in the port number.
This is, however, a separate problem and if you want to
solve it then my suggestion would be to teach the *transmit*
side about GRE offloads, so it can fill the source port in the card.


2). The entropy of the source udp src port is not enough, that is, the queue
is not widely distributed.

how isn't it enough? 16 bit is enough to cover all vqs ...


A 5-tuple brings more entropy than a single port, doesn't it? In fact, 
the inner hash of the physical network card used by
the business team is indeed better than the udp port number of the outer 
header we modify now, but they did not give me the data.



2. When there is an inner header hash, the gateway will directly help parse
the outer header, and use the inner 5 tuples to calculate the inner hash.
The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there is
no need to use some CPUs to forward and parse the outer header.

You really have to parse the outer header anyway,
otherwise there's no tunneling.
Unless you want to teach virtio to implement tunneling
in hardware, which is something I'd find it easier to
get behind.


There is no need to parse the outer header twice, because we use shared 
memory.



2) The entropy of the original quintuple is sufficient, and the queue is
widely distributed.

It's exactly the same entropy, why would it be better? In fact you
are taking out the outer hash entropy making things worse.


I don't get the point, why the entropy of the inner 5-tuple and the 
outer tunnel header is the same,

multiple streams have the same outer header.

Thanks.



Thanks.

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License:

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-08 Thread Michael S. Tsirkin

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > using the outer header of the receive packets is always fixed for the
> > > same flow packets, i.e. they will be steered to the same receive queue.
> > Wait a second. How is this true? Does not everyone stick the
> > inner header hash in the outer source port to solve this?
> 
> Yes, you are right. That's what we did before the inner header hash, but it
> has a performance penalty, which I'll explain below.
> 
> > For example geneve spec says:
> > 
> > it is necessary for entropy from encapsulated packets to be
> > exposed in the tunnel header.  The most common technique for this is
> > to use the UDP source port
> 
> The end point of the tunnel called the gateway (with DPDK on top of it).
> 
> 1. When there is no inner header hash, entropy can be inserted into the udp
> src port of the outer header of the tunnel,
> and then the tunnel packet is handed over to the host. The host needs to
> take out a part of the CPUs to parse the outer headers (but not drop them)
> to calculate the inner hash for the inner payloads,
> and then use the inner
> hash to forward them to another part of the CPUs that are responsible for
> processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?

> 1). During this process, the CPUs on the host is divided into two parts, one
> part is used as a forwarding node to parse the outer header,
>  and the CPU utilization is low. Another part handles packets.

Some overhead is clearly involved in *sending* packets -
to calculate the hash and stick it in the port number.
This is, however, a separate problem and if you want to
solve it then my suggestion would be to teach the *transmit*
side about GRE offloads, so it can fill the source port in the card.

> 2). The entropy of the source udp src port is not enough, that is, the queue
> is not widely distributed.

how isn't it enough? 16 bit is enough to cover all vqs ...

> 2. When there is an inner header hash, the gateway will directly help parse
> the outer header, and use the inner 5 tuples to calculate the inner hash.
> The tunneled packet is then handed over to the host.
> 1) All the CPUs of the host are used to process data packets, and there is
> no need to use some CPUs to forward and parse the outer header.

You really have to parse the outer header anyway,
otherwise there's no tunneling.
Unless you want to teach virtio to implement tunneling
in hardware, which is something I'd find it easier to
get behind.

> 2) The entropy of the original quintuple is sufficient, and the queue is
> widely distributed.

It's exactly the same entropy, why would it be better? In fact you
are taking out the outer hash entropy making things worse.

> 
> Thanks.
> > 
> > same goes for vxlan did not check further.
> > 
> > so what is the problem?  and which tunnel types actually suffer from the
> > problem?
> > 
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscr...@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
> List help: virtio-comment-h...@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-02 Thread Michael S. Tsirkin

On Thu, Mar 02, 2023 at 04:59:46PM +0800, Jason Wang wrote:
> On Thu, Mar 2, 2023 at 4:41 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote:
> > > On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> > > > > Kernel had already used the eBPF program for hashing, classifying
> > > > > various types of eBPF program other than XDP/socket filter
> > > > > (pass/drop).
> > > > >
> > > > > Thanks
> > > >
> > > > where is it used for hashing?
> > >
> > > I can see it is used by team/lb:
> > >
> > > static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
> > > struct sk_buff *skb)
> > > {
> > > struct bpf_prog *fp;
> > > uint32_t lhash;
> > > unsigned char *c;
> > >
> > > fp = rcu_dereference_bh(lb_priv->fp);
> > > if (unlikely(!fp))
> > > return 0;
> > > lhash = bpf_prog_run(fp, skb);
> > > c = (char *) 
> > > return c[0] ^ c[1] ^ c[2] ^ c[3];
> > > }
> > >
> > > But the point is that the return value is determined by the prog type
> > > (or the context).
> > >
> > > Thanks
> >
> > OK so assuming we do this, how will users program this exactly?
> 
> For DPDK users, it could be integrated with the PMD.
> For kernel ueres, it probably requires a virtio specific netlink or char 
> device.
> 
> > Given this is not standard, which tools will be used to attach such
> > a program to the device?
> 
> vDPA tool?
> 
> Thanks

Ugh.  I think I'd like ethtool to work.




> >
> >
> > > >
> > > > --
> > > > MST
> > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> > > >
> >


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-02 Thread Jason Wang

On Thu, Mar 2, 2023 at 4:41 PM Michael S. Tsirkin  wrote:
>
> On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote:
> > On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin  wrote:
> > >
> > > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> > > > Kernel had already used the eBPF program for hashing, classifying
> > > > various types of eBPF program other than XDP/socket filter
> > > > (pass/drop).
> > > >
> > > > Thanks
> > >
> > > where is it used for hashing?
> >
> > I can see it is used by team/lb:
> >
> > static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
> > struct sk_buff *skb)
> > {
> > struct bpf_prog *fp;
> > uint32_t lhash;
> > unsigned char *c;
> >
> > fp = rcu_dereference_bh(lb_priv->fp);
> > if (unlikely(!fp))
> > return 0;
> > lhash = bpf_prog_run(fp, skb);
> > c = (char *) 
> > return c[0] ^ c[1] ^ c[2] ^ c[3];
> > }
> >
> > But the point is that the return value is determined by the prog type
> > (or the context).
> >
> > Thanks
>
> OK so assuming we do this, how will users program this exactly?

For DPDK users, it could be integrated with the PMD.
For kernel ueres, it probably requires a virtio specific netlink or char device.

> Given this is not standard, which tools will be used to attach such
> a program to the device?

vDPA tool?

Thanks

>
>
> > >
> > > --
> > > MST
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> > >
>


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-02 Thread Michael S. Tsirkin

On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote:
> On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> > > Kernel had already used the eBPF program for hashing, classifying
> > > various types of eBPF program other than XDP/socket filter
> > > (pass/drop).
> > >
> > > Thanks
> >
> > where is it used for hashing?
> 
> I can see it is used by team/lb:
> 
> static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
> struct sk_buff *skb)
> {
> struct bpf_prog *fp;
> uint32_t lhash;
> unsigned char *c;
> 
> fp = rcu_dereference_bh(lb_priv->fp);
> if (unlikely(!fp))
> return 0;
> lhash = bpf_prog_run(fp, skb);
> c = (char *) 
> return c[0] ^ c[1] ^ c[2] ^ c[3];
> }
> 
> But the point is that the return value is determined by the prog type
> (or the context).
> 
> Thanks

OK so assuming we do this, how will users program this exactly?
Given this is not standard, which tools will be used to attach such
a program to the device?


> >
> > --
> > MST
> >
> >
> > -
> > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> >


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-02 Thread Jason Wang

On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin  wrote:
>
> On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> > Kernel had already used the eBPF program for hashing, classifying
> > various types of eBPF program other than XDP/socket filter
> > (pass/drop).
> >
> > Thanks
>
> where is it used for hashing?

I can see it is used by team/lb:

static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
struct sk_buff *skb)
{
struct bpf_prog *fp;
uint32_t lhash;
unsigned char *c;

fp = rcu_dereference_bh(lb_priv->fp);
if (unlikely(!fp))
return 0;
lhash = bpf_prog_run(fp, skb);
c = (char *) 
return c[0] ^ c[1] ^ c[2] ^ c[3];
}

But the point is that the return value is determined by the prog type
(or the context).

Thanks

>
> --
> MST
>
>
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-02 Thread Michael S. Tsirkin

On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> Kernel had already used the eBPF program for hashing, classifying
> various types of eBPF program other than XDP/socket filter
> (pass/drop).
> 
> Thanks

where is it used for hashing?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-01 Thread Jason Wang

On Thu, Mar 2, 2023 at 3:42 PM Michael S. Tsirkin  wrote:
>
> On Thu, Mar 02, 2023 at 10:57:12AM +0800, Jason Wang wrote:
> > On Wed, Mar 1, 2023 at 6:36 PM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Mar 01, 2023 at 10:36:41AM +0800, Jason Wang wrote:
> > > > On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > > > > > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > > > > > Btw, this kind of 1:1 hash features seems not scalable and 
> > > > > > > > > > flexible.
> > > > > > > > > > It requires an endless extension on bits/fields. Modern 
> > > > > > > > > > NICs allow the
> > > > > > > > > > user to customize the hash calculation, for virtio-net we 
> > > > > > > > > > can allow to
> > > > > > > > > > use eBPF program to classify the packets. It seems to be 
> > > > > > > > > > more flexible
> > > > > > > > > > and scalable and there's almost no maintain burden in the 
> > > > > > > > > > spec (only
> > > > > > > > > > bytecode is required, no need any fancy 
> > > > > > > > > > features/interactions like
> > > > > > > > > > maps), easy to be migrated etc.
> > > > > > > > > >
> > > > > > > > > > Prototype is also easy, tun/tap had an eBPF classifier for 
> > > > > > > > > > years.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Yea BPF offload would be great to have. We have been 
> > > > > > > > > discussing it for
> > > > > > > > > years though - security issues keep blocking it. *Maybe* it's 
> > > > > > > > > finally
> > > > > > > > > going to be there but I'm not going to block this work 
> > > > > > > > > waiting for BPF
> > > > > > > > > offload.  And easily migrated is what BPF is not.
> > > > > > > >
> > > > > > > > Just to make sure we're at the same page. I meant to find a way 
> > > > > > > > to
> > > > > > > > allow the driver/user to fully customize what it wants to
> > > > > > > > hash/classify. Similar technologies which is based on private 
> > > > > > > > solution
> > > > > > > > has been used by some vendors, which allow user to customize the
> > > > > > > > classifier[1]
> > > > > > > >
> > > > > > > > ePBF looks like a good open-source solution candidate for this 
> > > > > > > > (there
> > > > > > > > could be others). But there could be many kinds of eBPF 
> > > > > > > > programs that
> > > > > > > > could be offloaded. One famous one is XDP which requires many 
> > > > > > > > features
> > > > > > > > other than the bytecode/VM like map access, tailcall. Starting 
> > > > > > > > from
> > > > > > > > such a complicated type is hard. Instead, we can start from a 
> > > > > > > > simple
> > > > > > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > > > > > bytecode to the device, the device can choose to run it or 
> > > > > > > > compile it
> > > > > > > > to what it can understand for classifying. We don't need maps, 
> > > > > > > > tail
> > > > > > > > calls and other features.
> > > > > > >
> > > > > > > Until people start asking exactly for maps because they want
> > > > > > > state for their classifier?
> > > > > >
> > > > > > Yes, but let's compare the eBPF without maps with the static feature
> > > > > > proposed here. It is much more scalable and flexible.
> > > > >
> > > > > I looked for some examples of RSS using BPF and only found this:
> > > > > https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> > > > > seems to use maps.
> > > >
> > > > Yes and this is also the way we emulate RSS with TUN/TAP via steering
> > > > eBPF support for TUN/TAP. The reason is that it needs to emulate not
> > > > only the hash but also the indirection. If we only replace the hash
> > > > function with the eBPF program but reuse the RSS indirection table, we
> > > > don't need maps.
> > >
> > > How? Add a special helper?
> >
> > We can let the eBPF program return the hash:
> >
> > [eBPF hasing] -> hash value -> [indirection table lookup]
> >
> > Note that if we don't consider future full eBPF offloading, we can
> > start with classical BPF.
> >
> > Thanks
>
> So again this is a custom thing not a standard use of BPF.
> Normally value returned is pass/drop.

AFAIK there's no standard here. The semantic of the return value is
determined by the context of the (e)BPF program.

Kernel had already used the eBPF program for hashing, classifying
various types of eBPF program other than XDP/socket filter
(pass/drop).

Thanks

>
> > >
> > > > >
> > > > >
> > > > > > > And it makes sense - if you want
> > > > > > > e.g. load balancing you need stats which needs maps.
> > > > > >
> > > > > > Yes, but we know

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-01 Thread Michael S. Tsirkin

On Thu, Mar 02, 2023 at 10:57:12AM +0800, Jason Wang wrote:
> On Wed, Mar 1, 2023 at 6:36 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Mar 01, 2023 at 10:36:41AM +0800, Jason Wang wrote:
> > > On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > > > > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > > > > Btw, this kind of 1:1 hash features seems not scalable and 
> > > > > > > > > flexible.
> > > > > > > > > It requires an endless extension on bits/fields. Modern NICs 
> > > > > > > > > allow the
> > > > > > > > > user to customize the hash calculation, for virtio-net we can 
> > > > > > > > > allow to
> > > > > > > > > use eBPF program to classify the packets. It seems to be more 
> > > > > > > > > flexible
> > > > > > > > > and scalable and there's almost no maintain burden in the 
> > > > > > > > > spec (only
> > > > > > > > > bytecode is required, no need any fancy features/interactions 
> > > > > > > > > like
> > > > > > > > > maps), easy to be migrated etc.
> > > > > > > > >
> > > > > > > > > Prototype is also easy, tun/tap had an eBPF classifier for 
> > > > > > > > > years.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Yea BPF offload would be great to have. We have been discussing 
> > > > > > > > it for
> > > > > > > > years though - security issues keep blocking it. *Maybe* it's 
> > > > > > > > finally
> > > > > > > > going to be there but I'm not going to block this work waiting 
> > > > > > > > for BPF
> > > > > > > > offload.  And easily migrated is what BPF is not.
> > > > > > >
> > > > > > > Just to make sure we're at the same page. I meant to find a way to
> > > > > > > allow the driver/user to fully customize what it wants to
> > > > > > > hash/classify. Similar technologies which is based on private 
> > > > > > > solution
> > > > > > > has been used by some vendors, which allow user to customize the
> > > > > > > classifier[1]
> > > > > > >
> > > > > > > ePBF looks like a good open-source solution candidate for this 
> > > > > > > (there
> > > > > > > could be others). But there could be many kinds of eBPF programs 
> > > > > > > that
> > > > > > > could be offloaded. One famous one is XDP which requires many 
> > > > > > > features
> > > > > > > other than the bytecode/VM like map access, tailcall. Starting 
> > > > > > > from
> > > > > > > such a complicated type is hard. Instead, we can start from a 
> > > > > > > simple
> > > > > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > > > > bytecode to the device, the device can choose to run it or 
> > > > > > > compile it
> > > > > > > to what it can understand for classifying. We don't need maps, 
> > > > > > > tail
> > > > > > > calls and other features.
> > > > > >
> > > > > > Until people start asking exactly for maps because they want
> > > > > > state for their classifier?
> > > > >
> > > > > Yes, but let's compare the eBPF without maps with the static feature
> > > > > proposed here. It is much more scalable and flexible.
> > > >
> > > > I looked for some examples of RSS using BPF and only found this:
> > > > https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> > > > seems to use maps.
> > >
> > > Yes and this is also the way we emulate RSS with TUN/TAP via steering
> > > eBPF support for TUN/TAP. The reason is that it needs to emulate not
> > > only the hash but also the indirection. If we only replace the hash
> > > function with the eBPF program but reuse the RSS indirection table, we
> > > don't need maps.
> >
> > How? Add a special helper?
> 
> We can let the eBPF program return the hash:
> 
> [eBPF hasing] -> hash value -> [indirection table lookup]
> 
> Note that if we don't consider future full eBPF offloading, we can
> start with classical BPF.
> 
> Thanks

So again this is a custom thing not a standard use of BPF.
Normally value returned is pass/drop.

> >
> > > >
> > > >
> > > > > > And it makes sense - if you want
> > > > > > e.g. load balancing you need stats which needs maps.
> > > > >
> > > > > Yes, but we know it's possible to have that (through the XDP offload).
> > > >
> > > > Not without a lot more work to make xdp offload happen.
> > > >
> > >
> > > Yes, that's why a simple eBPF RSS hashing program looks much more easier.
> > >
> > > Thanks
> >
> > Notice that at this point this is no longer a generic BPF - you
> > are using a special helper. For tunnels I would imagine two tables
> > could easily turn out to be useful. Then what? Another table?
> > If yes then I can't say I like where this is going ...
> >
> > > > > This is impossible with

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-01 Thread Jason Wang

On Wed, Mar 1, 2023 at 6:36 PM Michael S. Tsirkin  wrote:
>
> On Wed, Mar 01, 2023 at 10:36:41AM +0800, Jason Wang wrote:
> > On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > > > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > > > Btw, this kind of 1:1 hash features seems not scalable and 
> > > > > > > > flexible.
> > > > > > > > It requires an endless extension on bits/fields. Modern NICs 
> > > > > > > > allow the
> > > > > > > > user to customize the hash calculation, for virtio-net we can 
> > > > > > > > allow to
> > > > > > > > use eBPF program to classify the packets. It seems to be more 
> > > > > > > > flexible
> > > > > > > > and scalable and there's almost no maintain burden in the spec 
> > > > > > > > (only
> > > > > > > > bytecode is required, no need any fancy features/interactions 
> > > > > > > > like
> > > > > > > > maps), easy to be migrated etc.
> > > > > > > >
> > > > > > > > Prototype is also easy, tun/tap had an eBPF classifier for 
> > > > > > > > years.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Yea BPF offload would be great to have. We have been discussing 
> > > > > > > it for
> > > > > > > years though - security issues keep blocking it. *Maybe* it's 
> > > > > > > finally
> > > > > > > going to be there but I'm not going to block this work waiting 
> > > > > > > for BPF
> > > > > > > offload.  And easily migrated is what BPF is not.
> > > > > >
> > > > > > Just to make sure we're at the same page. I meant to find a way to
> > > > > > allow the driver/user to fully customize what it wants to
> > > > > > hash/classify. Similar technologies which is based on private 
> > > > > > solution
> > > > > > has been used by some vendors, which allow user to customize the
> > > > > > classifier[1]
> > > > > >
> > > > > > ePBF looks like a good open-source solution candidate for this 
> > > > > > (there
> > > > > > could be others). But there could be many kinds of eBPF programs 
> > > > > > that
> > > > > > could be offloaded. One famous one is XDP which requires many 
> > > > > > features
> > > > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > > > such a complicated type is hard. Instead, we can start from a simple
> > > > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > > > bytecode to the device, the device can choose to run it or compile 
> > > > > > it
> > > > > > to what it can understand for classifying. We don't need maps, tail
> > > > > > calls and other features.
> > > > >
> > > > > Until people start asking exactly for maps because they want
> > > > > state for their classifier?
> > > >
> > > > Yes, but let's compare the eBPF without maps with the static feature
> > > > proposed here. It is much more scalable and flexible.
> > >
> > > I looked for some examples of RSS using BPF and only found this:
> > > https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> > > seems to use maps.
> >
> > Yes and this is also the way we emulate RSS with TUN/TAP via steering
> > eBPF support for TUN/TAP. The reason is that it needs to emulate not
> > only the hash but also the indirection. If we only replace the hash
> > function with the eBPF program but reuse the RSS indirection table, we
> > don't need maps.
>
> How? Add a special helper?

We can let the eBPF program return the hash:

[eBPF hasing] -> hash value -> [indirection table lookup]

Note that if we don't consider future full eBPF offloading, we can
start with classical BPF.

Thanks

>
> > >
> > >
> > > > > And it makes sense - if you want
> > > > > e.g. load balancing you need stats which needs maps.
> > > >
> > > > Yes, but we know it's possible to have that (through the XDP offload).
> > >
> > > Not without a lot more work to make xdp offload happen.
> > >
> >
> > Yes, that's why a simple eBPF RSS hashing program looks much more easier.
> >
> > Thanks
>
> Notice that at this point this is no longer a generic BPF - you
> are using a special helper. For tunnels I would imagine two tables
> could easily turn out to be useful. Then what? Another table?
> If yes then I can't say I like where this is going ...
>
> > > > This is impossible with the approach proposed here.
> > > >
> > > > >
> > > > > > We don't need to worry about the security
> > > > > > because of its simplicity: the eBPF program is only in charge of 
> > > > > > doing
> > > > > > classification, no other interactions with the driver and packet
> > > > > > modification is prohibited. The feature is limited only to the
> > > > > > VM/bytecode abstraction itself.
> > > > > >
> > > > > > What's more,

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-03-01 Thread Michael S. Tsirkin

On Wed, Mar 01, 2023 at 10:36:41AM +0800, Jason Wang wrote:
> On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > > Btw, this kind of 1:1 hash features seems not scalable and 
> > > > > > > flexible.
> > > > > > > It requires an endless extension on bits/fields. Modern NICs 
> > > > > > > allow the
> > > > > > > user to customize the hash calculation, for virtio-net we can 
> > > > > > > allow to
> > > > > > > use eBPF program to classify the packets. It seems to be more 
> > > > > > > flexible
> > > > > > > and scalable and there's almost no maintain burden in the spec 
> > > > > > > (only
> > > > > > > bytecode is required, no need any fancy features/interactions like
> > > > > > > maps), easy to be migrated etc.
> > > > > > >
> > > > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Yea BPF offload would be great to have. We have been discussing it 
> > > > > > for
> > > > > > years though - security issues keep blocking it. *Maybe* it's 
> > > > > > finally
> > > > > > going to be there but I'm not going to block this work waiting for 
> > > > > > BPF
> > > > > > offload.  And easily migrated is what BPF is not.
> > > > >
> > > > > Just to make sure we're at the same page. I meant to find a way to
> > > > > allow the driver/user to fully customize what it wants to
> > > > > hash/classify. Similar technologies which is based on private solution
> > > > > has been used by some vendors, which allow user to customize the
> > > > > classifier[1]
> > > > >
> > > > > ePBF looks like a good open-source solution candidate for this (there
> > > > > could be others). But there could be many kinds of eBPF programs that
> > > > > could be offloaded. One famous one is XDP which requires many features
> > > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > > such a complicated type is hard. Instead, we can start from a simple
> > > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > > bytecode to the device, the device can choose to run it or compile it
> > > > > to what it can understand for classifying. We don't need maps, tail
> > > > > calls and other features.
> > > >
> > > > Until people start asking exactly for maps because they want
> > > > state for their classifier?
> > >
> > > Yes, but let's compare the eBPF without maps with the static feature
> > > proposed here. It is much more scalable and flexible.
> >
> > I looked for some examples of RSS using BPF and only found this:
> > https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> > seems to use maps.
> 
> Yes and this is also the way we emulate RSS with TUN/TAP via steering
> eBPF support for TUN/TAP. The reason is that it needs to emulate not
> only the hash but also the indirection. If we only replace the hash
> function with the eBPF program but reuse the RSS indirection table, we
> don't need maps.

How? Add a special helper?

> >
> >
> > > > And it makes sense - if you want
> > > > e.g. load balancing you need stats which needs maps.
> > >
> > > Yes, but we know it's possible to have that (through the XDP offload).
> >
> > Not without a lot more work to make xdp offload happen.
> >
> 
> Yes, that's why a simple eBPF RSS hashing program looks much more easier.
> 
> Thanks

Notice that at this point this is no longer a generic BPF - you
are using a special helper. For tunnels I would imagine two tables
could easily turn out to be useful. Then what? Another table?
If yes then I can't say I like where this is going ...

> > > This is impossible with the approach proposed here.
> > >
> > > >
> > > > > We don't need to worry about the security
> > > > > because of its simplicity: the eBPF program is only in charge of doing
> > > > > classification, no other interactions with the driver and packet
> > > > > modification is prohibited. The feature is limited only to the
> > > > > VM/bytecode abstraction itself.
> > > > >
> > > > > What's more, it's a good first step to achieve full eBPF offloading in
> > > > > the future.
> > > > >
> > > > > Thanks
> > > > >
> > > > > [1] 
> > > > > https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> > > >
> > > > Dave seems to have nacked this approach, no?
> > >
> > > I may miss something but looking at kernel commit, there are few
> > > patches to support that:
> > >
> > > E.g
> > >
> > > commit c7648810961682b9388be2dd041df06915647445
> > > Author: Tony Nguyen 
> > > Date:   Mon Sep 9 06:47:44 2019

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Heng Qi





在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?


Yes, you are right. That's what we did before the inner header hash, but 
it has a performance penalty, which I'll explain below.



For example geneve spec says:

it is necessary for entropy from encapsulated packets to be
exposed in the tunnel header.  The most common technique for this is
to use the UDP source port


The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the 
udp src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to 
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads, and then use the 
inner hash to forward them to another part of the CPUs that are 
responsible for processing.
1). During this process, the CPUs on the host is divided into two parts, 
one part is used as a forwarding node to parse the outer header,

 and the CPU utilization is low. Another part handles packets.
2). The entropy of the source udp src port is not enough, that is, the 
queue is not widely distributed.


2. When there is an inner header hash, the gateway will directly help 
parse the outer header, and use the inner 5 tuples to calculate the 
inner hash.

The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there 
is no need to use some CPUs to forward and parse the outer header.
2) The entropy of the original quintuple is sufficient, and the queue is 
widely distributed.


Thanks.



same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Jason Wang

On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin  wrote:
>
> On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > > It requires an endless extension on bits/fields. Modern NICs allow 
> > > > > > the
> > > > > > user to customize the hash calculation, for virtio-net we can allow 
> > > > > > to
> > > > > > use eBPF program to classify the packets. It seems to be more 
> > > > > > flexible
> > > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > > bytecode is required, no need any fancy features/interactions like
> > > > > > maps), easy to be migrated etc.
> > > > > >
> > > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > > going to be there but I'm not going to block this work waiting for BPF
> > > > > offload.  And easily migrated is what BPF is not.
> > > >
> > > > Just to make sure we're at the same page. I meant to find a way to
> > > > allow the driver/user to fully customize what it wants to
> > > > hash/classify. Similar technologies which is based on private solution
> > > > has been used by some vendors, which allow user to customize the
> > > > classifier[1]
> > > >
> > > > ePBF looks like a good open-source solution candidate for this (there
> > > > could be others). But there could be many kinds of eBPF programs that
> > > > could be offloaded. One famous one is XDP which requires many features
> > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > such a complicated type is hard. Instead, we can start from a simple
> > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > bytecode to the device, the device can choose to run it or compile it
> > > > to what it can understand for classifying. We don't need maps, tail
> > > > calls and other features.
> > >
> > > Until people start asking exactly for maps because they want
> > > state for their classifier?
> >
> > Yes, but let's compare the eBPF without maps with the static feature
> > proposed here. It is much more scalable and flexible.
>
> I looked for some examples of RSS using BPF and only found this:
> https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> seems to use maps.

Yes and this is also the way we emulate RSS with TUN/TAP via steering
eBPF support for TUN/TAP. The reason is that it needs to emulate not
only the hash but also the indirection. If we only replace the hash
function with the eBPF program but reuse the RSS indirection table, we
don't need maps.

>
>
> > > And it makes sense - if you want
> > > e.g. load balancing you need stats which needs maps.
> >
> > Yes, but we know it's possible to have that (through the XDP offload).
>
> Not without a lot more work to make xdp offload happen.
>

Yes, that's why a simple eBPF RSS hashing program looks much more easier.

Thanks

> > This is impossible with the approach proposed here.
> >
> > >
> > > > We don't need to worry about the security
> > > > because of its simplicity: the eBPF program is only in charge of doing
> > > > classification, no other interactions with the driver and packet
> > > > modification is prohibited. The feature is limited only to the
> > > > VM/bytecode abstraction itself.
> > > >
> > > > What's more, it's a good first step to achieve full eBPF offloading in
> > > > the future.
> > > >
> > > > Thanks
> > > >
> > > > [1] 
> > > > https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> > >
> > > Dave seems to have nacked this approach, no?
> >
> > I may miss something but looking at kernel commit, there are few
> > patches to support that:
> >
> > E.g
> >
> > commit c7648810961682b9388be2dd041df06915647445
> > Author: Tony Nguyen 
> > Date:   Mon Sep 9 06:47:44 2019 -0700
> >
> > ice: Implement Dynamic Device Personalization (DDP) download
> >
> > And it has been used by DPDK drivers.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > >
>


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Michael S. Tsirkin

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> If the tunnel is used to encapsulate the packets, the hash calculated
> using the outer header of the receive packets is always fixed for the
> same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?
For example geneve spec says:

   it is necessary for entropy from encapsulated packets to be
   exposed in the tunnel header.  The most common technique for this is
   to use the UDP source port

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Michael S. Tsirkin

On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  wrote:
> >
> > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > bytecode is required, no need any fancy features/interactions like
> > > > > maps), easy to be migrated etc.
> > > > >
> > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > >
> > > > > Thanks
> > > >
> > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > going to be there but I'm not going to block this work waiting for BPF
> > > > offload.  And easily migrated is what BPF is not.
> > >
> > > Just to make sure we're at the same page. I meant to find a way to
> > > allow the driver/user to fully customize what it wants to
> > > hash/classify. Similar technologies which is based on private solution
> > > has been used by some vendors, which allow user to customize the
> > > classifier[1]
> > >
> > > ePBF looks like a good open-source solution candidate for this (there
> > > could be others). But there could be many kinds of eBPF programs that
> > > could be offloaded. One famous one is XDP which requires many features
> > > other than the bytecode/VM like map access, tailcall. Starting from
> > > such a complicated type is hard. Instead, we can start from a simple
> > > type, that is the eBPF classifier. All it needs is to pass the
> > > bytecode to the device, the device can choose to run it or compile it
> > > to what it can understand for classifying. We don't need maps, tail
> > > calls and other features.
> >
> > Until people start asking exactly for maps because they want
> > state for their classifier?
> 
> Yes, but let's compare the eBPF without maps with the static feature
> proposed here. It is much more scalable and flexible.

I looked for some examples of RSS using BPF and only found this:
https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
seems to use maps.


> > And it makes sense - if you want
> > e.g. load balancing you need stats which needs maps.
> 
> Yes, but we know it's possible to have that (through the XDP offload).

Not without a lot more work to make xdp offload happen.

> This is impossible with the approach proposed here.
> 
> >
> > > We don't need to worry about the security
> > > because of its simplicity: the eBPF program is only in charge of doing
> > > classification, no other interactions with the driver and packet
> > > modification is prohibited. The feature is limited only to the
> > > VM/bytecode abstraction itself.
> > >
> > > What's more, it's a good first step to achieve full eBPF offloading in
> > > the future.
> > >
> > > Thanks
> > >
> > > [1] 
> > > https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> >
> > Dave seems to have nacked this approach, no?
> 
> I may miss something but looking at kernel commit, there are few
> patches to support that:
> 
> E.g
> 
> commit c7648810961682b9388be2dd041df06915647445
> Author: Tony Nguyen 
> Date:   Mon Sep 9 06:47:44 2019 -0700
> 
> ice: Implement Dynamic Device Personalization (DDP) download
> 
> And it has been used by DPDK drivers.
> 
> Thanks
> 
> >
> > > >
> > > > --
> > > > MST
> > > >
> >


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Heng Qi





在 2023/2/28 下午4:52, Michael S. Tsirkin 写道:

On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:

On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  wrote:

On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:

On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  wrote:

On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:

Btw, this kind of 1:1 hash features seems not scalable and flexible.
It requires an endless extension on bits/fields. Modern NICs allow the
user to customize the hash calculation, for virtio-net we can allow to
use eBPF program to classify the packets. It seems to be more flexible
and scalable and there's almost no maintain burden in the spec (only
bytecode is required, no need any fancy features/interactions like
maps), easy to be migrated etc.

Prototype is also easy, tun/tap had an eBPF classifier for years.

Thanks

Yea BPF offload would be great to have. We have been discussing it for
years though - security issues keep blocking it. *Maybe* it's finally
going to be there but I'm not going to block this work waiting for BPF
offload.  And easily migrated is what BPF is not.

Just to make sure we're at the same page. I meant to find a way to
allow the driver/user to fully customize what it wants to
hash/classify. Similar technologies which is based on private solution
has been used by some vendors, which allow user to customize the
classifier[1]

ePBF looks like a good open-source solution candidate for this (there
could be others). But there could be many kinds of eBPF programs that
could be offloaded. One famous one is XDP which requires many features
other than the bytecode/VM like map access, tailcall. Starting from
such a complicated type is hard. Instead, we can start from a simple
type, that is the eBPF classifier. All it needs is to pass the
bytecode to the device, the device can choose to run it or compile it
to what it can understand for classifying. We don't need maps, tail
calls and other features.

Until people start asking exactly for maps because they want
state for their classifier?

Yes, but let's compare the eBPF without maps with the static feature
proposed here. It is much more scalable and flexible.


And it makes sense - if you want
e.g. load balancing you need stats which needs maps.

Yes, but we know it's possible to have that (through the XDP offload).
This is impossible with the approach proposed here.

I'm not actually objecting. And at least we then don't need to
worry about leaking info - it's not virtio leaking info
it's the bpf program. I wonder what does Heng Qi think.
Heng Qi would it work for your scenario?


We are positive on ebpf, which looks adequate in our scenario. Although 
it currently has some problems in offloading,
such as imperfect interfaces, unstable, and user-unfriendly ebpf codes 
may consume a lot of device resources. Device support for ebpf will also 
take time.
Also, the presence of ebpf offload does not conflict with other 
solutions, eg we still have RSS.


Our goal is to pass this patch first. For the support of ebpf 
offloading, we have not collected internal requirements for the time 
being, but it is indeed a good direction.


Thanks.




We don't need to worry about the security
because of its simplicity: the eBPF program is only in charge of doing
classification, no other interactions with the driver and packet
modification is prohibited. The feature is limited only to the
VM/bytecode abstraction itself.

What's more, it's a good first step to achieve full eBPF offloading in
the future.

Thanks

[1] 
https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

Dave seems to have nacked this approach, no?

I may miss something but looking at kernel commit, there are few
patches to support that:

E.g

commit c7648810961682b9388be2dd041df06915647445
Author: Tony Nguyen 
Date:   Mon Sep 9 06:47:44 2019 -0700

 ice: Implement Dynamic Device Personalization (DDP) download

And it has been used by DPDK drivers.

Thanks

If we are talking about netdev then this discussion has to take place on netdev.
If it's dpdk this is more believable.


--
MST



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-28 Thread Michael S. Tsirkin

On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  wrote:
> >
> > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > bytecode is required, no need any fancy features/interactions like
> > > > > maps), easy to be migrated etc.
> > > > >
> > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > >
> > > > > Thanks
> > > >
> > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > going to be there but I'm not going to block this work waiting for BPF
> > > > offload.  And easily migrated is what BPF is not.
> > >
> > > Just to make sure we're at the same page. I meant to find a way to
> > > allow the driver/user to fully customize what it wants to
> > > hash/classify. Similar technologies which is based on private solution
> > > has been used by some vendors, which allow user to customize the
> > > classifier[1]
> > >
> > > ePBF looks like a good open-source solution candidate for this (there
> > > could be others). But there could be many kinds of eBPF programs that
> > > could be offloaded. One famous one is XDP which requires many features
> > > other than the bytecode/VM like map access, tailcall. Starting from
> > > such a complicated type is hard. Instead, we can start from a simple
> > > type, that is the eBPF classifier. All it needs is to pass the
> > > bytecode to the device, the device can choose to run it or compile it
> > > to what it can understand for classifying. We don't need maps, tail
> > > calls and other features.
> >
> > Until people start asking exactly for maps because they want
> > state for their classifier?
> 
> Yes, but let's compare the eBPF without maps with the static feature
> proposed here. It is much more scalable and flexible.
> 
> > And it makes sense - if you want
> > e.g. load balancing you need stats which needs maps.
> 
> Yes, but we know it's possible to have that (through the XDP offload).
> This is impossible with the approach proposed here.

I'm not actually objecting. And at least we then don't need to
worry about leaking info - it's not virtio leaking info
it's the bpf program. I wonder what does Heng Qi think.
Heng Qi would it work for your scenario?

> >
> > > We don't need to worry about the security
> > > because of its simplicity: the eBPF program is only in charge of doing
> > > classification, no other interactions with the driver and packet
> > > modification is prohibited. The feature is limited only to the
> > > VM/bytecode abstraction itself.
> > >
> > > What's more, it's a good first step to achieve full eBPF offloading in
> > > the future.
> > >
> > > Thanks
> > >
> > > [1] 
> > > https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> >
> > Dave seems to have nacked this approach, no?
> 
> I may miss something but looking at kernel commit, there are few
> patches to support that:
> 
> E.g
> 
> commit c7648810961682b9388be2dd041df06915647445
> Author: Tony Nguyen 
> Date:   Mon Sep 9 06:47:44 2019 -0700
> 
> ice: Implement Dynamic Device Personalization (DDP) download
> 
> And it has been used by DPDK drivers.
> 
> Thanks

If we are talking about netdev then this discussion has to take place on netdev.
If it's dpdk this is more believable.

> >
> > > >
> > > > --
> > > > MST
> > > >
> >


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-27 Thread Jason Wang

On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin  wrote:
>
> On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > and scalable and there's almost no maintain burden in the spec (only
> > > > bytecode is required, no need any fancy features/interactions like
> > > > maps), easy to be migrated etc.
> > > >
> > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > >
> > > > Thanks
> > >
> > > Yea BPF offload would be great to have. We have been discussing it for
> > > years though - security issues keep blocking it. *Maybe* it's finally
> > > going to be there but I'm not going to block this work waiting for BPF
> > > offload.  And easily migrated is what BPF is not.
> >
> > Just to make sure we're at the same page. I meant to find a way to
> > allow the driver/user to fully customize what it wants to
> > hash/classify. Similar technologies which is based on private solution
> > has been used by some vendors, which allow user to customize the
> > classifier[1]
> >
> > ePBF looks like a good open-source solution candidate for this (there
> > could be others). But there could be many kinds of eBPF programs that
> > could be offloaded. One famous one is XDP which requires many features
> > other than the bytecode/VM like map access, tailcall. Starting from
> > such a complicated type is hard. Instead, we can start from a simple
> > type, that is the eBPF classifier. All it needs is to pass the
> > bytecode to the device, the device can choose to run it or compile it
> > to what it can understand for classifying. We don't need maps, tail
> > calls and other features.
>
> Until people start asking exactly for maps because they want
> state for their classifier?

Yes, but let's compare the eBPF without maps with the static feature
proposed here. It is much more scalable and flexible.

> And it makes sense - if you want
> e.g. load balancing you need stats which needs maps.

Yes, but we know it's possible to have that (through the XDP offload).
This is impossible with the approach proposed here.

>
> > We don't need to worry about the security
> > because of its simplicity: the eBPF program is only in charge of doing
> > classification, no other interactions with the driver and packet
> > modification is prohibited. The feature is limited only to the
> > VM/bytecode abstraction itself.
> >
> > What's more, it's a good first step to achieve full eBPF offloading in
> > the future.
> >
> > Thanks
> >
> > [1] 
> > https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
>
> Dave seems to have nacked this approach, no?

I may miss something but looking at kernel commit, there are few
patches to support that:

E.g

commit c7648810961682b9388be2dd041df06915647445
Author: Tony Nguyen 
Date:   Mon Sep 9 06:47:44 2019 -0700

ice: Implement Dynamic Device Personalization (DDP) download

And it has been used by DPDK drivers.

Thanks

>
> > >
> > > --
> > > MST
> > >
>


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-27 Thread Michael S. Tsirkin

On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > user to customize the hash calculation, for virtio-net we can allow to
> > > use eBPF program to classify the packets. It seems to be more flexible
> > > and scalable and there's almost no maintain burden in the spec (only
> > > bytecode is required, no need any fancy features/interactions like
> > > maps), easy to be migrated etc.
> > >
> > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > >
> > > Thanks
> >
> > Yea BPF offload would be great to have. We have been discussing it for
> > years though - security issues keep blocking it. *Maybe* it's finally
> > going to be there but I'm not going to block this work waiting for BPF
> > offload.  And easily migrated is what BPF is not.
> 
> Just to make sure we're at the same page. I meant to find a way to
> allow the driver/user to fully customize what it wants to
> hash/classify. Similar technologies which is based on private solution
> has been used by some vendors, which allow user to customize the
> classifier[1]
> 
> ePBF looks like a good open-source solution candidate for this (there
> could be others). But there could be many kinds of eBPF programs that
> could be offloaded. One famous one is XDP which requires many features
> other than the bytecode/VM like map access, tailcall. Starting from
> such a complicated type is hard. Instead, we can start from a simple
> type, that is the eBPF classifier. All it needs is to pass the
> bytecode to the device, the device can choose to run it or compile it
> to what it can understand for classifying. We don't need maps, tail
> calls and other features.

Until people start asking exactly for maps because they want
state for their classifier? And it makes sense - if you want
e.g. load balancing you need stats which needs maps.

> We don't need to worry about the security
> because of its simplicity: the eBPF program is only in charge of doing
> classification, no other interactions with the driver and packet
> modification is prohibited. The feature is limited only to the
> VM/bytecode abstraction itself.
> 
> What's more, it's a good first step to achieve full eBPF offloading in
> the future.
> 
> Thanks
> 
> [1] 
> https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

Dave seems to have nacked this approach, no?

> >
> > --
> > MST
> >


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-27 Thread Heng Qi

在 2023/2/27 下午4:35, Jason Wang 写道:

On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin wrote:

On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:

Btw, this kind of 1:1 hash features seems not scalable and flexible.
It requires an endless extension on bits/fields. Modern NICs allow the
user to customize the hash calculation, for virtio-net we can allow to
use eBPF program to classify the packets. It seems to be more flexible
and scalable and there's almost no maintain burden in the spec (only
bytecode is required, no need any fancy features/interactions like
maps), easy to be migrated etc.

Prototype is also easy, tun/tap had an eBPF classifier for years.

Thanks

Yea BPF offload would be great to have. We have been discussing it for
years though - security issues keep blocking it. *Maybe* it's finally
going to be there but I'm not going to block this work waiting for BPF
offload. And easily migrated is what BPF is not.

Just to make sure we're at the same page. I meant to find a way to
allow the driver/user to fully customize what it wants to
hash/classify. Similar technologies which is based on private solution
has been used by some vendors, which allow user to customize the
classifier[1]

ePBF looks like a good open-source solution candidate for this (there
could be others). But there could be many kinds of eBPF programs that
could be offloaded. One famous one is XDP which requires many features
other than the bytecode/VM like map access, tailcall. Starting from
such a complicated type is hard. Instead, we can start from a simple
type, that is the eBPF classifier. All it needs is to pass the
bytecode to the device, the device can choose to run it or compile it
to what it can understand for classifying. We don't need maps, tail
calls and other features. We don't need to worry about the security
because of its simplicity: the eBPF program is only in charge of doing
classification, no other interactions with the driver and packet
modification is prohibited. The feature is limited only to the
VM/bytecode abstraction itself.

Since the instruction set of ebpf is not complicated, some devices
already support the offloading of ebpf,
but what troubles them is how to standardize each device to implement
standard and optional subsystem interfaces.

There are two reasons:
#1, due to the rapid development of ebpf , so it is difficult for them
to ensure the backward compatibility of the interface.
#2, such as network and blk devices, which interfaces must be
implemented to allow the same ebpf program to run perfectly on each other.
Also, ebpf program is not isolated, it is expected to interact with
userspace programs, and few examples can demonstrate

how to use them effectively.

Maybe we can take advantage of the virtio spec to get out there first.

Thanks.

What's more, it's a good first step to achieve full eBPF offloading in
the future.

Thanks

[1]
https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

--
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-27 Thread Jason Wang

On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin  wrote:
>
> On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > It requires an endless extension on bits/fields. Modern NICs allow the
> > user to customize the hash calculation, for virtio-net we can allow to
> > use eBPF program to classify the packets. It seems to be more flexible
> > and scalable and there's almost no maintain burden in the spec (only
> > bytecode is required, no need any fancy features/interactions like
> > maps), easy to be migrated etc.
> >
> > Prototype is also easy, tun/tap had an eBPF classifier for years.
> >
> > Thanks
>
> Yea BPF offload would be great to have. We have been discussing it for
> years though - security issues keep blocking it. *Maybe* it's finally
> going to be there but I'm not going to block this work waiting for BPF
> offload.  And easily migrated is what BPF is not.

Just to make sure we're at the same page. I meant to find a way to
allow the driver/user to fully customize what it wants to
hash/classify. Similar technologies which is based on private solution
has been used by some vendors, which allow user to customize the
classifier[1]

ePBF looks like a good open-source solution candidate for this (there
could be others). But there could be many kinds of eBPF programs that
could be offloaded. One famous one is XDP which requires many features
other than the bytecode/VM like map access, tailcall. Starting from
such a complicated type is hard. Instead, we can start from a simple
type, that is the eBPF classifier. All it needs is to pass the
bytecode to the device, the device can choose to run it or compile it
to what it can understand for classifying. We don't need maps, tail
calls and other features. We don't need to worry about the security
because of its simplicity: the eBPF program is only in charge of doing
classification, no other interactions with the driver and packet
modification is prohibited. The feature is limited only to the
VM/bytecode abstraction itself.

What's more, it's a good first step to achieve full eBPF offloading in
the future.

Thanks

[1] 
https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

>
> --
> MST
>

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-26 Thread Michael S. Tsirkin

On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> Btw, this kind of 1:1 hash features seems not scalable and flexible.
> It requires an endless extension on bits/fields. Modern NICs allow the
> user to customize the hash calculation, for virtio-net we can allow to
> use eBPF program to classify the packets. It seems to be more flexible
> and scalable and there's almost no maintain burden in the spec (only
> bytecode is required, no need any fancy features/interactions like
> maps), easy to be migrated etc.
> 
> Prototype is also easy, tun/tap had an eBPF classifier for years.
> 
> Thanks

Yea BPF offload would be great to have. We have been discussing it for
years though - security issues keep blocking it. *Maybe* it's finally
going to be there but I'm not going to block this work waiting for BPF
offload.  And easily migrated is what BPF is not.

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-26 Thread Jason Wang

On Fri, Feb 24, 2023 at 4:06 PM Michael S. Tsirkin  wrote:
>
> On Fri, Feb 24, 2023 at 10:26:30AM +0800, Jason Wang wrote:
> > On Thu, Feb 23, 2023 at 9:03 PM Michael S. Tsirkin  wrote:
> > >
> > > On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> > > > Hi:
> > > >
> > > > 在 2023/2/22 14:46, Heng Qi 写道:
> > > > > Hi, Jason. Long time no see. :)
> > > > >
> > > > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > > > >
> > > > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > +\subparagraph{Security risks between encapsulated packets and 
> > > > > > > > RSS}
> > > > > > > > +There may be potential security risks when encapsulated
> > > > > > > > packets using RSS to
> > > > > > > > +select queues for placement. When a user inside a tunnel
> > > > > > > > tries to control the
> > > > > >
> > > > > >
> > > > > > What do you mean by "user" here? Is it a remote or local one?
> > > > > >
> > > > >
> > > > > I mean a remote attacker who is not under the control of the tunnel
> > > > > owner.
> > > >
> > > >
> > > > Anything may the tunnel different? I think this can happen even without
> > > > tunnel (and even with single queue).
> > >
> > > I think you are missing the fact that tunnel is normally a
> > > security boundary: users within the tunnel can not control
> > > what is happening outside.
> > > The feature breaks the encapsulation somewhat.
> >
> > I'm not sure I understand here, if we allow hash based on the inner
> > packet, is it something that you meant the things that are happening
> > outside? It doesn't differ too much from the case where the tunnel is
> > not used. It's impossible to prevent what a remote user is trying to
> > send, and if there's a NIC behaviour that depends on the packet
> > content, the behaviour of the NIC is somehow under the control of the
> > remote user.
> >
> > Since we only care about the device driver interface, what we can do
> > is probably:
> >
> > 1) allow the driver to disable the inner hash when it spots a
> > potential (D)DOS. And in the device, a fair queueing looks like a must
> > but it should be the implementation details.
>
> this breaks rss

Not sure I get here, what I want to say is that the issue described
here is not something than can be addressed in the level of hashing or
RSS. It needs to be processed before the packet can arrive at any hash
filters in the RX pipeline.

There's probably no need to mention in now consider we haven't (or
there's probably no need to) defined a full RX pipeline.

>
> > 2) hash based on both outer and inner
>
> this might help a bit
>
> > >
> > > For example without tunneling it is possible
> > > to create a special "bad guy queue" and direct specific tunnels
> > > there by playing with key and indirection table.
> >
> > Anything makes the tunneling different? We can still do this via the
> > inner header hash, or at least we can disable the inner hash if we see
> > a remote DOS.
> >
> > Thanks
>
> the difference is that tunneling is used for security/partitioning.

The problem is that we don't/can't support all type of tunnel. It
should be no difference with an old virtio-net device without tunnel
hashing.

Btw, this kind of 1:1 hash features seems not scalable and flexible.
It requires an endless extension on bits/fields. Modern NICs allow the
user to customize the hash calculation, for virtio-net we can allow to
use eBPF program to classify the packets. It seems to be more flexible
and scalable and there's almost no maintain burden in the spec (only
bytecode is required, no need any fancy features/interactions like
maps), easy to be migrated etc.

Prototype is also easy, tun/tap had an eBPF classifier for years.

Thanks

>
> > >
> > > > How to mitigate those attackers seems more like a implementation details
> > > > where might require fair queuing or other QOS technology which has been 
> > > > well
> > > > studied.
> > > >
> > > > It seems out of the scope of the spec (unless we want to let driver
> > > > manageable QOS).
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > > > > the device with invaild
> > > > > > > > +packets, and the flooded packets may be hashed into the
> > > > > > > > same queue as packets in
> > > > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > > > +
> > > > > > > > +This can pose several security risks:
> > > > > > > > +\begin{itemize}
> > > > > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > > > > enqueued due to queue
> > > > > > > > +   overflow, resulting in a large amount of packet loss.
> > > > > > > > +\item  The delay and retransmission of packets in the
> > > > > > > > normal tunnels are extremely increased.
> > > > > > > > +\item  The user can observe the traffic information and
> > > > > > > > enqueue

[virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-26 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Friday, February 24, 2023 3:13 AM

[..]
> > The inner hash is only needed for GRE, IPIP etc.
> > For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the
> source port of the outer header.
> > It does that based on the inner header.
> > Refer to [1] as one example.
> >
> > [1]
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L9
> > 22
> 
> But I think hash was requested for RSS with dpdk, no?
> 
Yes, but if src inserts the entropy (dpdk or kernel), UDP based tunnel can live 
with outer header hash.
Ip over ip, gre tunnels needs benefit if queues do not overflow or the 
processing is fast enough as Heng explained.


> > > The lookup will work like this then:
> > >
> > > calculate outer hash
> > > if (rss[outer hash] & tunnel bit)
> > Tunnel bit, you mean tunneled packet, right?
> 
> this idea stores a bit in the indirection table which signals which of the 
> hashes
> to use for rss
> 
> > > then
> > >   calculate inner hash
> > >   return rss[inner hash] & ~tunnel bit
> > Why to end with a tunnel bit?
> 
> 
> this just clears the bit so we end up with a vq number.
> 
> > > else
> > >   return rss[outer hash]
> > >
> > >

Above scheme partitions the rss indirection table into two parts.
1. one for tunnel processing
2. second without it. (this one uses outer hash as today)

When #1 is done in your example, it is without hierarchy.
So inner hash can still result in collision, as before in same VQ. 
Say VQ 0,1,2,3.
Indirection is setup so that 0,1 has tunnel bit set.
2,3 has tunnel bit cleared.
Rss of our hash finds it true and inner hash for two different tunnel is still 
maps to single VQ.

> > > this fixes the security issue returning us back to status quo :
> > > specific tunnels can be directed to separate queues.
> > >
> > The number of tunnels is far higher than the number of queues with para virt
> driver doing decap.
> 
> True. This seeks to get us back to where we are before the feature:
> driver can send specific outer hashes to specific queues.
> outer hash collisions remain a problem.
> 
So far mlx5 device has done hash on inner header for non udp.

For steering packets to specific queues is done by flow programming to the 
specific RQs which supports for outer, and inner both.
Ethtool -config-nfc has it for long time too, such flow steering is due for 
virtio net too.
It is orthogonal to RSS.

> 
> > >
> > > This is for RSS.
> > >
> > >
> > > For hash reporting indirection table is not used.
> > > Maybe it is enough to signal to driver that inner hash was used.
> > > We do need that signalling though.
> > >
> > > My question would be whether it's practical to implement in hardware.
> >
> > In above example, hw calculating double hash is difficult without much gain.
> > Either calculating on one inner or outer makes sense.
> >
> > Signaling whether calculated on inner or outer is fine because hw exactly 
> > tells
> what it did.
> 
> This, in a sense, is what reporting hash tunnel type did.
> Do you now think we need it?

I don't see a consumer sw of it.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-24 Thread Michael S. Tsirkin

On Fri, Feb 24, 2023 at 10:38:37PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/24 下午4:13, Michael S. Tsirkin 写道:
> > On Thu, Feb 23, 2023 at 02:40:46PM +, Parav Pandit wrote:
> > > 
> > > > From: Michael S. Tsirkin 
> > > > Sent: Thursday, February 23, 2023 8:14 AM
> > > > 
> > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > 
> > > > So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up 
> > > > with
> > > > an idea: RSS indirection table entries are 16 bit but onlu 15 bits are 
> > > > used to
> > > > indentify an RX queue.
> > > > We can use the remaining bit as a "tunnel bit" to signal whether to use 
> > > > the
> > > > inner or the outer hash for queue selection.
> > > > 
> > > I further brainstormed internally with Saeed and Rony on this.
> > > 
> > > The inner hash is only needed for GRE, IPIP etc.
> > > For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the 
> > > source port of the outer header.
> > > It does that based on the inner header.
> > > Refer to [1] as one example.
> > > 
> > > [1] 
> > > https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922
> > But I think hash was requested for RSS with dpdk, no?
> 
> I think yes, at least probably the first customer to use the feature might
> be dpdk.:)
> 
> 
> > 
> > > > The lookup will work like this then:
> > > > 
> > > > calculate outer hash
> > > > if (rss[outer hash] & tunnel bit)
> > > Tunnel bit, you mean tunneled packet, right?
> > this idea stores a bit in the indirection table
> > which signals which of the hashes to use for rss
> 
> This allows inner hash to have the ability to select a queue and place
> packets to the queue (that is, parallel to RSS),
> which seems to be different from our discussion before v9. 
> 
> Thanks.

Not exactly. The idea is that we start with outer hash.
Based on that we use rss table to decide whether to use the inner hash.

Given that Parav claims it's difficult to implement in
hardware I'm not insisting this idea is included
in the patchset. We can add it later.


> > 
> > > > then
> > > > calculate inner hash
> > > > return rss[inner hash] & ~tunnel bit
> > > Why to end with a tunnel bit?
> > 
> > this just clears the bit so we end up with a vq number.
> > 
> > > > else
> > > > return rss[outer hash]
> > > > 
> > > > 
> > > > this fixes the security issue returning us back to status quo : 
> > > > specific tunnels can
> > > > be directed to separate queues.
> > > > 
> > > The number of tunnels is far higher than the number of queues with para 
> > > virt driver doing decap.
> > True. This seeks to get us back to where we are before the feature:
> > driver can send specific outer hashes to specific queues.
> > outer hash collisions remain a problem.
> > 
> > 
> > > > This is for RSS.
> > > > 
> > > > 
> > > > For hash reporting indirection table is not used.
> > > > Maybe it is enough to signal to driver that inner hash was used.
> > > > We do need that signalling though.
> > > > 
> > > > My question would be whether it's practical to implement in hardware.
> > > In above example, hw calculating double hash is difficult without much 
> > > gain.
> > > Either calculating on one inner or outer makes sense.
> > > 
> > > Signaling whether calculated on inner or outer is fine because hw exactly 
> > > tells what it did.
> > This, in a sense, is what reporting hash tunnel type did.
> > Do you now think we need it?
> > 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-24 Thread Heng Qi





在 2023/2/24 下午4:13, Michael S. Tsirkin 写道:

On Thu, Feb 23, 2023 at 02:40:46PM +, Parav Pandit wrote:



From: Michael S. Tsirkin 
Sent: Thursday, February 23, 2023 8:14 AM

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:



So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to
indentify an RX queue.
We can use the remaining bit as a "tunnel bit" to signal whether to use the
inner or the outer hash for queue selection.


I further brainstormed internally with Saeed and Rony on this.

The inner hash is only needed for GRE, IPIP etc.
For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source 
port of the outer header.
It does that based on the inner header.
Refer to [1] as one example.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922

But I think hash was requested for RSS with dpdk, no?


I think yes, at least probably the first customer to use the feature 
might be dpdk.:)






The lookup will work like this then:

calculate outer hash
if (rss[outer hash] & tunnel bit)

Tunnel bit, you mean tunneled packet, right?

this idea stores a bit in the indirection table
which signals which of the hashes to use for rss


This allows inner hash to have the ability to select a queue and place 
packets to the queue (that is, parallel to RSS),

which seems to be different from our discussion before v9. 

Thanks.




then
calculate inner hash
return rss[inner hash] & ~tunnel bit

Why to end with a tunnel bit?


this just clears the bit so we end up with a vq number.


else
return rss[outer hash]


this fixes the security issue returning us back to status quo : specific 
tunnels can
be directed to separate queues.


The number of tunnels is far higher than the number of queues with para virt 
driver doing decap.

True. This seeks to get us back to where we are before the feature:
driver can send specific outer hashes to specific queues.
outer hash collisions remain a problem.



This is for RSS.


For hash reporting indirection table is not used.
Maybe it is enough to signal to driver that inner hash was used.
We do need that signalling though.

My question would be whether it's practical to implement in hardware.

In above example, hw calculating double hash is difficult without much gain.
Either calculating on one inner or outer makes sense.

Signaling whether calculated on inner or outer is fine because hw exactly tells 
what it did.

This, in a sense, is what reporting hash tunnel type did.
Do you now think we need it?




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-24 Thread Michael S. Tsirkin

On Thu, Feb 23, 2023 at 02:40:46PM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Thursday, February 23, 2023 8:14 AM
> > 
> > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> 
> 
> > So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
> > an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used 
> > to
> > indentify an RX queue.
> > We can use the remaining bit as a "tunnel bit" to signal whether to use the
> > inner or the outer hash for queue selection.
> >
> I further brainstormed internally with Saeed and Rony on this.
> 
> The inner hash is only needed for GRE, IPIP etc.
> For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source 
> port of the outer header.
> It does that based on the inner header.
> Refer to [1] as one example.
> 
> [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922

But I think hash was requested for RSS with dpdk, no?

> > The lookup will work like this then:
> > 
> > calculate outer hash
> > if (rss[outer hash] & tunnel bit)
> Tunnel bit, you mean tunneled packet, right?

this idea stores a bit in the indirection table
which signals which of the hashes to use for rss

> > then
> > calculate inner hash
> > return rss[inner hash] & ~tunnel bit
> Why to end with a tunnel bit?


this just clears the bit so we end up with a vq number.

> > else
> > return rss[outer hash]
> > 
> > 
> > this fixes the security issue returning us back to status quo : specific 
> > tunnels can
> > be directed to separate queues.
> >
> The number of tunnels is far higher than the number of queues with para virt 
> driver doing decap.

True. This seeks to get us back to where we are before the feature:
driver can send specific outer hashes to specific queues.
outer hash collisions remain a problem.


> > 
> > This is for RSS.
> > 
> > 
> > For hash reporting indirection table is not used.
> > Maybe it is enough to signal to driver that inner hash was used.
> > We do need that signalling though.
> > 
> > My question would be whether it's practical to implement in hardware.
> 
> In above example, hw calculating double hash is difficult without much gain.
> Either calculating on one inner or outer makes sense.
> 
> Signaling whether calculated on inner or outer is fine because hw exactly 
> tells what it did.

This, in a sense, is what reporting hash tunnel type did.
Do you now think we need it?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-24 Thread Michael S. Tsirkin

On Fri, Feb 24, 2023 at 10:45:13AM +0800, Jason Wang wrote:
> 
> 在 2023/2/23 12:41, Heng Qi 写道:
> > 
> > 
> > 在 2023/2/23 上午10:50, Jason Wang 写道:
> > > Hi:
> > > 
> > > 在 2023/2/22 14:46, Heng Qi 写道:
> > > > Hi, Jason. Long time no see. :)
> > > > 
> > > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > > > 
> > > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > +\subparagraph{Security risks between encapsulated packets and 
> > > > > > > RSS}
> > > > > > > +There may be potential security risks when
> > > > > > > encapsulated packets using RSS to
> > > > > > > +select queues for placement. When a user inside a
> > > > > > > tunnel tries to control the
> > > > > 
> > > > > 
> > > > > What do you mean by "user" here? Is it a remote or local one?
> > > > > 
> > > > 
> > > > I mean a remote attacker who is not under the control of the
> > > > tunnel owner.
> > > 
> > > 
> > > Anything may the tunnel different? I think this can happen even
> > > without tunnel (and even with single queue).
> > 
> > I agree.
> > 
> > > 
> > > How to mitigate those attackers seems more like a implementation
> > > details where might require fair queuing or other QOS technology
> > > which has been well studied.
> > 
> > I am also not sure whether this point needs to be focused on in the
> > spec, and I see that the protection against tunnel DoS is more protected
> > outside the device,
> > but it seems to be okay to give some attack reminders.
> 
> 
> Maybe it's sufficient to say the device should make sure the fairness among
> different flows when queuing packets?
> 
> Thanks

that isn't really achievable.

> 
> > 
> > Thanks.
> > 
> > > 
> > > It seems out of the scope of the spec (unless we want to let driver
> > > manageable QOS).
> > > 
> > > Thanks
> > > 
> > > 
> > > > 
> > > > Thanks.
> > > > 
> > > > > 
> > > > > > > +enqueuing of encapsulated packets, then the user
> > > > > > > can flood the device with invaild
> > > > > > > +packets, and the flooded packets may be hashed into
> > > > > > > the same queue as packets in
> > > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > > +
> > > > > > > +This can pose several security risks:
> > > > > > > +\begin{itemize}
> > > > > > > +\item  Encapsulated packets in the normal tunnels
> > > > > > > cannot be enqueued due to queue
> > > > > > > +   overflow, resulting in a large amount of packet loss.
> > > > > > > +\item  The delay and retransmission of packets in
> > > > > > > the normal tunnels are extremely increased.
> > > > > > > +\item  The user can observe the traffic information
> > > > > > > and enqueue information of other normal
> > > > > > > +   tunnels, and conduct targeted DoS attacks.
> > > > > > > +\end{\itemize}
> > > > > > > +
> > > > > > Hmm with this all written out it sounds pretty severe.
> > > > > 
> > > > > 
> > > > > I think we need first understand whether or not it's a
> > > > > problem that we need to solve at spec level:
> > > > > 
> > > > > 1) anything make encapsulated packets different or why we
> > > > > can't hit this problem without encapsulation
> > > > > 
> > > > > 2) whether or not it's the implementation details that the
> > > > > spec doesn't need to care (or how it is solved in real NIC)
> > > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > > At this point with no ways to mitigate, I don't feel
> > > > > > this is something
> > > > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > > > others  find this somehow useful e.g. for dpdk.
> > > > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > > > and asking them for the opinion?
> > > > > > 
> > > > > > 
> > > > 
> > > 
> > > 
> > > -
> > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> > 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-24 Thread Michael S. Tsirkin

On Fri, Feb 24, 2023 at 10:26:30AM +0800, Jason Wang wrote:
> On Thu, Feb 23, 2023 at 9:03 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> > > Hi:
> > >
> > > 在 2023/2/22 14:46, Heng Qi 写道:
> > > > Hi, Jason. Long time no see. :)
> > > >
> > > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > > >
> > > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > +\subparagraph{Security risks between encapsulated packets and 
> > > > > > > RSS}
> > > > > > > +There may be potential security risks when encapsulated
> > > > > > > packets using RSS to
> > > > > > > +select queues for placement. When a user inside a tunnel
> > > > > > > tries to control the
> > > > >
> > > > >
> > > > > What do you mean by "user" here? Is it a remote or local one?
> > > > >
> > > >
> > > > I mean a remote attacker who is not under the control of the tunnel
> > > > owner.
> > >
> > >
> > > Anything may the tunnel different? I think this can happen even without
> > > tunnel (and even with single queue).
> >
> > I think you are missing the fact that tunnel is normally a
> > security boundary: users within the tunnel can not control
> > what is happening outside.
> > The feature breaks the encapsulation somewhat.
> 
> I'm not sure I understand here, if we allow hash based on the inner
> packet, is it something that you meant the things that are happening
> outside? It doesn't differ too much from the case where the tunnel is
> not used. It's impossible to prevent what a remote user is trying to
> send, and if there's a NIC behaviour that depends on the packet
> content, the behaviour of the NIC is somehow under the control of the
> remote user.
> 
> Since we only care about the device driver interface, what we can do
> is probably:
> 
> 1) allow the driver to disable the inner hash when it spots a
> potential (D)DOS. And in the device, a fair queueing looks like a must
> but it should be the implementation details.

this breaks rss

> 2) hash based on both outer and inner

this might help a bit

> >
> > For example without tunneling it is possible
> > to create a special "bad guy queue" and direct specific tunnels
> > there by playing with key and indirection table.
> 
> Anything makes the tunneling different? We can still do this via the
> inner header hash, or at least we can disable the inner hash if we see
> a remote DOS.
> 
> Thanks

the difference is that tunneling is used for security/partitioning.

> >
> > > How to mitigate those attackers seems more like a implementation details
> > > where might require fair queuing or other QOS technology which has been 
> > > well
> > > studied.
> > >
> > > It seems out of the scope of the spec (unless we want to let driver
> > > manageable QOS).
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > > > the device with invaild
> > > > > > > +packets, and the flooded packets may be hashed into the
> > > > > > > same queue as packets in
> > > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > > +
> > > > > > > +This can pose several security risks:
> > > > > > > +\begin{itemize}
> > > > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > > > enqueued due to queue
> > > > > > > +   overflow, resulting in a large amount of packet loss.
> > > > > > > +\item  The delay and retransmission of packets in the
> > > > > > > normal tunnels are extremely increased.
> > > > > > > +\item  The user can observe the traffic information and
> > > > > > > enqueue information of other normal
> > > > > > > +   tunnels, and conduct targeted DoS attacks.
> > > > > > > +\end{\itemize}
> > > > > > > +
> > > > > > Hmm with this all written out it sounds pretty severe.
> > > > >
> > > > >
> > > > > I think we need first understand whether or not it's a problem that
> > > > > we need to solve at spec level:
> > > > >
> > > > > 1) anything make encapsulated packets different or why we can't hit
> > > > > this problem without encapsulation
> > > > >
> > > > > 2) whether or not it's the implementation details that the spec
> > > > > doesn't need to care (or how it is solved in real NIC)
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > > At this point with no ways to mitigate, I don't feel this is 
> > > > > > something
> > > > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > > > others  find this somehow useful e.g. for dpdk.
> > > > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > > > and asking them for the opinion?
> > > > > >
> > > > > >
> > > >
> >


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-24 Thread Michael S. Tsirkin

On Fri, Feb 24, 2023 at 12:42:40PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/23 下午9:13, Michael S. Tsirkin 写道:
> > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > +There may be potential security risks when encapsulated packets using 
> > > RSS to
> > > +select queues for placement.
> > Is this just with RSS? I assume hash calculation is also used for
> > something like queueing so there's a similar risk even just
> > with hash reporting, no?
> 
> I don't understand why it would be risky to just report hash when not used
> for queuing,
> and even we don't report whether hash come from inner or outer now because
> there is no more hash_report_tunnel.

Well what is the hash used for? Presumably it's used for queueing within
driver, no? Collisions there then have exactly the same effect as queue
collisions in the device.

> > 
> > > When a user inside a tunnel tries to control the
> > > +enqueuing of encapsulated packets, then the user can flood the device 
> > > with invaild
> > > +packets, and the flooded packets may be hashed into the same queue as 
> > > packets in
> > > +other normal tunnels, which causing the queue to overflow.
> > > +
> > > +This can pose several security risks:
> > > +\begin{itemize}
> > > +\item  Encapsulated packets in the normal tunnels cannot be enqueued due 
> > > to queue
> > > +   overflow, resulting in a large amount of packet loss.
> > > +\item  The delay and retransmission of packets in the normal tunnels are 
> > > extremely increased.
> > > +\item  The user can observe the traffic information and enqueue 
> > > information of other normal
> > > +   tunnels, and conduct targeted DoS attacks.
> > > +\end{\itemize}
> > > +
> > 
> > So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came
> > up with an idea: RSS indirection table entries are 16 bit but
> > onlu 15 bits are used to indentify an RX queue.
> > We can use the remaining bit as a "tunnel bit" to signal whether to use the
> > inner or the outer hash for queue selection.
> > 
> > The lookup will work like this then:
> > 
> > calculate outer hash
> > if (rss[outer hash] & tunnel bit)
> 
> How a tunnel bit distinguishes between multiple tunnel types, and I think it
> is not so reasonable to use the
> indirection table to determine the switch of the inner hash. The inner hash
> is only the ability to calculate the hash,
> and does not involve the indirection table.
> 
> Thanks.
> 
> > then
> > calculate inner hash
> > return rss[inner hash] & ~tunnel bit
> > else
> > return rss[outer hash]
> > 
> > 
> > this fixes the security issue returning us back to
> > status quo : specific tunnels can be directed to separate queues.
> > 
> > 
> > This is for RSS.
> > 
> > 
> > For hash reporting indirection table is not used.
> > Maybe it is enough to signal to driver that inner hash was used.
> > We do need that signalling though.
> > 
> > My question would be whether it's practical to implement in hardware.
> > 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Heng Qi





在 2023/2/24 上午10:45, Jason Wang 写道:


在 2023/2/23 12:41, Heng Qi 写道:



在 2023/2/23 上午10:50, Jason Wang 写道:

Hi:

在 2023/2/22 14:46, Heng Qi 写道:

Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:


在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets 
using RSS to
+select queues for placement. When a user inside a tunnel tries 
to control the



What do you mean by "user" here? Is it a remote or local one?



I mean a remote attacker who is not under the control of the tunnel 
owner.



Anything may the tunnel different? I think this can happen even 
without tunnel (and even with single queue).


I agree.



How to mitigate those attackers seems more like a implementation 
details where might require fair queuing or other QOS technology 
which has been well studied.


I am also not sure whether this point needs to be focused on in the 
spec, and I see that the protection against tunnel DoS is more 
protected outside the device,

but it seems to be okay to give some attack reminders.



Maybe it's sufficient to say the device should make sure the fairness 
among different flows when queuing packets?


Yes, maybe the device does not guarantee QoS or needs to guarantee 
enqueue fairness between flows.


Thanks.



Thanks




Thanks.



It seems out of the scope of the spec (unless we want to let driver 
manageable QOS).


Thanks




Thanks.



+enqueuing of encapsulated packets, then the user can flood the 
device with invaild
+packets, and the flooded packets may be hashed into the same 
queue as packets in

+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be 
enqueued due to queue

+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal 
tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue 
information of other normal

+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem 
that we need to solve at spec level:


1) anything make encapsulated packets different or why we can't 
hit this problem without encapsulation


2) whether or not it's the implementation details that the spec 
doesn't need to care (or how it is solved in real NIC)


Thanks


At this point with no ways to mitigate, I don't feel this is 
something

e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org





This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: 
https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Heng Qi





在 2023/2/23 下午9:13, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets using RSS to
+select queues for placement.

Is this just with RSS? I assume hash calculation is also used for
something like queueing so there's a similar risk even just
with hash reporting, no?


I don't understand why it would be risky to just report hash when not 
used for queuing,
and even we don't report whether hash come from inner or outer now 
because there is no more hash_report_tunnel.





When a user inside a tunnel tries to control the
+enqueuing of encapsulated packets, then the user can flood the device with 
invaild
+packets, and the flooded packets may be hashed into the same queue as packets 
in
+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be enqueued due to 
queue
+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal tunnels are 
extremely increased.
+\item  The user can observe the traffic information and enqueue information of 
other normal
+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+


So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came
up with an idea: RSS indirection table entries are 16 bit but
onlu 15 bits are used to indentify an RX queue.
We can use the remaining bit as a "tunnel bit" to signal whether to use the
inner or the outer hash for queue selection.

The lookup will work like this then:

calculate outer hash
if (rss[outer hash] & tunnel bit)


How a tunnel bit distinguishes between multiple tunnel types, and I 
think it is not so reasonable to use the
indirection table to determine the switch of the inner hash. The inner 
hash is only the ability to calculate the hash,

and does not involve the indirection table.

Thanks.


then
calculate inner hash
return rss[inner hash] & ~tunnel bit
else
return rss[outer hash]


this fixes the security issue returning us back to
status quo : specific tunnels can be directed to separate queues.


This is for RSS.


For hash reporting indirection table is not used.
Maybe it is enough to signal to driver that inner hash was used.
We do need that signalling though.

My question would be whether it's practical to implement in hardware.




-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Jason Wang




在 2023/2/23 12:41, Heng Qi 写道:



在 2023/2/23 上午10:50, Jason Wang 写道:

Hi:

在 2023/2/22 14:46, Heng Qi 写道:

Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:


在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets 
using RSS to
+select queues for placement. When a user inside a tunnel tries 
to control the



What do you mean by "user" here? Is it a remote or local one?



I mean a remote attacker who is not under the control of the tunnel 
owner.



Anything may the tunnel different? I think this can happen even 
without tunnel (and even with single queue).


I agree.



How to mitigate those attackers seems more like a implementation 
details where might require fair queuing or other QOS technology 
which has been well studied.


I am also not sure whether this point needs to be focused on in the 
spec, and I see that the protection against tunnel DoS is more 
protected outside the device,

but it seems to be okay to give some attack reminders.



Maybe it's sufficient to say the device should make sure the fairness 
among different flows when queuing packets?


Thanks




Thanks.



It seems out of the scope of the spec (unless we want to let driver 
manageable QOS).


Thanks




Thanks.



+enqueuing of encapsulated packets, then the user can flood the 
device with invaild
+packets, and the flooded packets may be hashed into the same 
queue as packets in

+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be 
enqueued due to queue

+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal 
tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue 
information of other normal

+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem that 
we need to solve at spec level:


1) anything make encapsulated packets different or why we can't hit 
this problem without encapsulation


2) whether or not it's the implementation details that the spec 
doesn't need to care (or how it is solved in real NIC)


Thanks


At this point with no ways to mitigate, I don't feel this is 
something

e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Jason Wang

On Thu, Feb 23, 2023 at 9:03 PM Michael S. Tsirkin  wrote:
>
> On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> > Hi:
> >
> > 在 2023/2/22 14:46, Heng Qi 写道:
> > > Hi, Jason. Long time no see. :)
> > >
> > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > >
> > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > > +There may be potential security risks when encapsulated
> > > > > > packets using RSS to
> > > > > > +select queues for placement. When a user inside a tunnel
> > > > > > tries to control the
> > > >
> > > >
> > > > What do you mean by "user" here? Is it a remote or local one?
> > > >
> > >
> > > I mean a remote attacker who is not under the control of the tunnel
> > > owner.
> >
> >
> > Anything may the tunnel different? I think this can happen even without
> > tunnel (and even with single queue).
>
> I think you are missing the fact that tunnel is normally a
> security boundary: users within the tunnel can not control
> what is happening outside.
> The feature breaks the encapsulation somewhat.

I'm not sure I understand here, if we allow hash based on the inner
packet, is it something that you meant the things that are happening
outside? It doesn't differ too much from the case where the tunnel is
not used. It's impossible to prevent what a remote user is trying to
send, and if there's a NIC behaviour that depends on the packet
content, the behaviour of the NIC is somehow under the control of the
remote user.

Since we only care about the device driver interface, what we can do
is probably:

1) allow the driver to disable the inner hash when it spots a
potential (D)DOS. And in the device, a fair queueing looks like a must
but it should be the implementation details.
2) hash based on both outer and inner

>
> For example without tunneling it is possible
> to create a special "bad guy queue" and direct specific tunnels
> there by playing with key and indirection table.

Anything makes the tunneling different? We can still do this via the
inner header hash, or at least we can disable the inner hash if we see
a remote DOS.

Thanks

>
> > How to mitigate those attackers seems more like a implementation details
> > where might require fair queuing or other QOS technology which has been well
> > studied.
> >
> > It seems out of the scope of the spec (unless we want to let driver
> > manageable QOS).
> >
> > Thanks
> >
> >
> > >
> > > Thanks.
> > >
> > > >
> > > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > > the device with invaild
> > > > > > +packets, and the flooded packets may be hashed into the
> > > > > > same queue as packets in
> > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > +
> > > > > > +This can pose several security risks:
> > > > > > +\begin{itemize}
> > > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > > enqueued due to queue
> > > > > > +   overflow, resulting in a large amount of packet loss.
> > > > > > +\item  The delay and retransmission of packets in the
> > > > > > normal tunnels are extremely increased.
> > > > > > +\item  The user can observe the traffic information and
> > > > > > enqueue information of other normal
> > > > > > +   tunnels, and conduct targeted DoS attacks.
> > > > > > +\end{\itemize}
> > > > > > +
> > > > > Hmm with this all written out it sounds pretty severe.
> > > >
> > > >
> > > > I think we need first understand whether or not it's a problem that
> > > > we need to solve at spec level:
> > > >
> > > > 1) anything make encapsulated packets different or why we can't hit
> > > > this problem without encapsulation
> > > >
> > > > 2) whether or not it's the implementation details that the spec
> > > > doesn't need to care (or how it is solved in real NIC)
> > > >
> > > > Thanks
> > > >
> > > >
> > > > > At this point with no ways to mitigate, I don't feel this is something
> > > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > > others  find this somehow useful e.g. for dpdk.
> > > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > > and asking them for the opinion?
> > > > >
> > > > >
> > >
>


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Parav Pandit




> From: Michael S. Tsirkin 
> Sent: Thursday, February 23, 2023 8:14 AM
> 
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:


> So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
> an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to
> indentify an RX queue.
> We can use the remaining bit as a "tunnel bit" to signal whether to use the
> inner or the outer hash for queue selection.
>
I further brainstormed internally with Saeed and Rony on this.

The inner hash is only needed for GRE, IPIP etc.
For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source 
port of the outer header.
It does that based on the inner header.
Refer to [1] as one example.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922

> The lookup will work like this then:
> 
> calculate outer hash
> if (rss[outer hash] & tunnel bit)
Tunnel bit, you mean tunneled packet, right?

> then
>   calculate inner hash
>   return rss[inner hash] & ~tunnel bit
Why to end with a tunnel bit?

> else
>   return rss[outer hash]
> 
> 
> this fixes the security issue returning us back to status quo : specific 
> tunnels can
> be directed to separate queues.
>
The number of tunnels is far higher than the number of queues with para virt 
driver doing decap.
 
> 
> This is for RSS.
> 
> 
> For hash reporting indirection table is not used.
> Maybe it is enough to signal to driver that inner hash was used.
> We do need that signalling though.
> 
> My question would be whether it's practical to implement in hardware.

In above example, hw calculating double hash is difficult without much gain.
Either calculating on one inner or outer makes sense.

Signaling whether calculated on inner or outer is fine because hw exactly tells 
what it did.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Michael S. Tsirkin

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> +\subparagraph{Security risks between encapsulated packets and RSS}
> +There may be potential security risks when encapsulated packets using RSS to
> +select queues for placement.

Is this just with RSS? I assume hash calculation is also used for
something like queueing so there's a similar risk even just
with hash reporting, no?

> When a user inside a tunnel tries to control the
> +enqueuing of encapsulated packets, then the user can flood the device with 
> invaild
> +packets, and the flooded packets may be hashed into the same queue as 
> packets in
> +other normal tunnels, which causing the queue to overflow.
> +
> +This can pose several security risks:
> +\begin{itemize}
> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to 
> queue
> +   overflow, resulting in a large amount of packet loss.
> +\item  The delay and retransmission of packets in the normal tunnels are 
> extremely increased.
> +\item  The user can observe the traffic information and enqueue information 
> of other normal
> +   tunnels, and conduct targeted DoS attacks.
> +\end{\itemize}
> +

So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came
up with an idea: RSS indirection table entries are 16 bit but
onlu 15 bits are used to indentify an RX queue.
We can use the remaining bit as a "tunnel bit" to signal whether to use the
inner or the outer hash for queue selection.

The lookup will work like this then:

calculate outer hash
if (rss[outer hash] & tunnel bit)
then
calculate inner hash
return rss[inner hash] & ~tunnel bit
else
return rss[outer hash]

this fixes the security issue returning us back to
status quo : specific tunnels can be directed to separate queues.

This is for RSS.

For hash reporting indirection table is not used.
Maybe it is enough to signal to driver that inner hash was used.
We do need that signalling though.

My question would be whether it's practical to implement in hardware.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-23 Thread Michael S. Tsirkin

On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> Hi:
> 
> 在 2023/2/22 14:46, Heng Qi 写道:
> > Hi, Jason. Long time no see. :)
> > 
> > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > 
> > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > +There may be potential security risks when encapsulated
> > > > > packets using RSS to
> > > > > +select queues for placement. When a user inside a tunnel
> > > > > tries to control the
> > > 
> > > 
> > > What do you mean by "user" here? Is it a remote or local one?
> > > 
> > 
> > I mean a remote attacker who is not under the control of the tunnel
> > owner.
> 
> 
> Anything may the tunnel different? I think this can happen even without
> tunnel (and even with single queue).

I think you are missing the fact that tunnel is normally a
security boundary: users within the tunnel can not control
what is happening outside.
The feature breaks the encapsulation somewhat.

For example without tunneling it is possible
to create a special "bad guy queue" and direct specific tunnels
there by playing with key and indirection table.

> How to mitigate those attackers seems more like a implementation details
> where might require fair queuing or other QOS technology which has been well
> studied.
> 
> It seems out of the scope of the spec (unless we want to let driver
> manageable QOS).
> 
> Thanks
> 
> 
> > 
> > Thanks.
> > 
> > > 
> > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > the device with invaild
> > > > > +packets, and the flooded packets may be hashed into the
> > > > > same queue as packets in
> > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > +
> > > > > +This can pose several security risks:
> > > > > +\begin{itemize}
> > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > enqueued due to queue
> > > > > +   overflow, resulting in a large amount of packet loss.
> > > > > +\item  The delay and retransmission of packets in the
> > > > > normal tunnels are extremely increased.
> > > > > +\item  The user can observe the traffic information and
> > > > > enqueue information of other normal
> > > > > +   tunnels, and conduct targeted DoS attacks.
> > > > > +\end{\itemize}
> > > > > +
> > > > Hmm with this all written out it sounds pretty severe.
> > > 
> > > 
> > > I think we need first understand whether or not it's a problem that
> > > we need to solve at spec level:
> > > 
> > > 1) anything make encapsulated packets different or why we can't hit
> > > this problem without encapsulation
> > > 
> > > 2) whether or not it's the implementation details that the spec
> > > doesn't need to care (or how it is solved in real NIC)
> > > 
> > > Thanks
> > > 
> > > 
> > > > At this point with no ways to mitigate, I don't feel this is something
> > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > others  find this somehow useful e.g. for dpdk.
> > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > and asking them for the opinion?
> > > > 
> > > > 
> > 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-22 Thread Heng Qi





在 2023/2/23 上午10:50, Jason Wang 写道:

Hi:

在 2023/2/22 14:46, Heng Qi 写道:

Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:


在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets 
using RSS to
+select queues for placement. When a user inside a tunnel tries to 
control the



What do you mean by "user" here? Is it a remote or local one?



I mean a remote attacker who is not under the control of the tunnel 
owner.



Anything may the tunnel different? I think this can happen even 
without tunnel (and even with single queue).


I agree.



How to mitigate those attackers seems more like a implementation 
details where might require fair queuing or other QOS technology which 
has been well studied.


I am also not sure whether this point needs to be focused on in the 
spec, and I see that the protection against tunnel DoS is more protected 
outside the device,

but it seems to be okay to give some attack reminders.

Thanks.



It seems out of the scope of the spec (unless we want to let driver 
manageable QOS).


Thanks




Thanks.



+enqueuing of encapsulated packets, then the user can flood the 
device with invaild
+packets, and the flooded packets may be hashed into the same 
queue as packets in

+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be 
enqueued due to queue

+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal 
tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue 
information of other normal

+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem that 
we need to solve at spec level:


1) anything make encapsulated packets different or why we can't hit 
this problem without encapsulation


2) whether or not it's the implementation details that the spec 
doesn't need to care (or how it is solved in real NIC)


Thanks



At this point with no ways to mitigate, I don't feel this is something
e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-22 Thread Jason Wang


Hi:

在 2023/2/22 14:46, Heng Qi 写道:

Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:


在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets 
using RSS to
+select queues for placement. When a user inside a tunnel tries to 
control the



What do you mean by "user" here? Is it a remote or local one?



I mean a remote attacker who is not under the control of the tunnel 
owner.



Anything may the tunnel different? I think this can happen even without 
tunnel (and even with single queue).


How to mitigate those attackers seems more like a implementation details 
where might require fair queuing or other QOS technology which has been 
well studied.


It seems out of the scope of the spec (unless we want to let driver 
manageable QOS).


Thanks




Thanks.



+enqueuing of encapsulated packets, then the user can flood the 
device with invaild
+packets, and the flooded packets may be hashed into the same queue 
as packets in

+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be 
enqueued due to queue

+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal 
tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue 
information of other normal

+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem that 
we need to solve at spec level:


1) anything make encapsulated packets different or why we can't hit 
this problem without encapsulation


2) whether or not it's the implementation details that the spec 
doesn't need to care (or how it is solved in real NIC)


Thanks



At this point with no ways to mitigate, I don't feel this is something
e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-22 Thread Michael S. Tsirkin

On Wed, Feb 22, 2023 at 02:46:51PM +0800, Heng Qi wrote:
> Hi, Jason. Long time no see. :)
> 
> 在 2023/2/22 上午11:22, Jason Wang 写道:
> > 
> > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > +There may be potential security risks when encapsulated packets
> > > > using RSS to
> > > > +select queues for placement. When a user inside a tunnel tries
> > > > to control the
> > 
> > 
> > What do you mean by "user" here? Is it a remote or local one?
> > 
> 
> I mean a remote attacker who is not under the control of the tunnel owner.
> 
> Thanks.

OK let's just say "remote attacker" then.

> > 
> > > > +enqueuing of encapsulated packets, then the user can flood the
> > > > device with invaild
> > > > +packets, and the flooded packets may be hashed into the same
> > > > queue as packets in
> > > > +other normal tunnels, which causing the queue to overflow.
> > > > +
> > > > +This can pose several security risks:
> > > > +\begin{itemize}
> > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > enqueued due to queue
> > > > +   overflow, resulting in a large amount of packet loss.
> > > > +\item  The delay and retransmission of packets in the normal
> > > > tunnels are extremely increased.
> > > > +\item  The user can observe the traffic information and enqueue
> > > > information of other normal
> > > > +   tunnels, and conduct targeted DoS attacks.
> > > > +\end{\itemize}
> > > > +
> > > Hmm with this all written out it sounds pretty severe.
> > 
> > 
> > I think we need first understand whether or not it's a problem that we
> > need to solve at spec level:
> > 
> > 1) anything make encapsulated packets different or why we can't hit this
> > problem without encapsulation
> > 
> > 2) whether or not it's the implementation details that the spec doesn't
> > need to care (or how it is solved in real NIC)
> > 
> > Thanks
> > 
> > 
> > > At this point with no ways to mitigate, I don't feel this is something
> > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > others  find this somehow useful e.g. for dpdk.
> > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > and asking them for the opinion?
> > > 
> > > 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-22 Thread Michael S. Tsirkin

On Wed, Feb 22, 2023 at 03:03:32PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/22 下午2:21, Michael S. Tsirkin 写道:
> > On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote:
> > > > The user will figure out how to mitigate when such QoS is not 
> > > > available. Either to run in best-effort mode or mitigate differently.
> > > Yes, our cloud security and cloud network team will configure and use 
> > > inner
> > > hash on dpdk.
> > Sounds good. More practical for dpdk than Linux.
> > Is there a chance that when the interface is close
> > to be final, but before the vote, you post a patch to the dpdk list and
> > get some acks from the maintainers, cc virtio-dev. This way we won't
> > merge something that will then go unused?
> > That would be best - do you have a prototype?
> 
> Not yet, dpdk and the business team are waiting for our virtio
> specification, and
> they have stated as a business team that their implementation on dpdk will
> not necessarily be open sourced to the community.

Ugh so no open source implementations at all :(


> > 
> > > In fact I discussed with them the security issues between
> > > tunnels,
> > > and I will quote their solutions to tunnel attacks below, but this is a
> > > problem between the tunnels, not the introduction of inner hash.
> > > I don't think we need to focus too much on this, but I'll do my best to
> > > describe the security issues between tunnels in v10.
> > > 
> > > "
> > > This is not a problem with the inner hash, it is a general problem with 
> > > the
> > > outer hash.
> > > I communicated with our people who are doing cloud security (they are also
> > > one of the demanders of inner hash),
> > > and it is a common problem for one tunnel to attack another tunnel.
> > > 
> > > For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, 
> > > and
> > > the vni id of t1 is id1, and the vni id of v2 is id2; a VM.
> > > 
> > > At this time, regardless of the inner hash or the outer hash, the traffic 
> > > of
> > > tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a
> > > single queue or multiple queues),
> > > and may be placed on the same queue to cause queue overflow.
> > Do note (and explain in spec?) that with just an outer hash and RSS it
> > is possible to configure the tunnels to use distict queues. Impossible
> > with this interface but arguably only works for a small number of
> > tunnels anyway.
> > 
> > > # Solutions:
> > More like mitigations.
> 
> Yes, you are right.
> 
> > 
> > > 1. Some current forwarding tools such as DPDK have good forwarding
> > > performance, and it is difficult to fill up the queue;
> > Oh that's a good point. If driver is generally faster than the device
> > and queues stay away from filling up there's no DoS.
> > I'd add this to the spec.
> 
> Ok.
> 
> > 
> > > 2. or switch the attack traffic to the attack clusters;
> > What is that?
> 
> This is done by the monitoring part outside the tunnel, which is also an
> important mitigation method they mentioned
> to prevent DoS between tunnels. For example, the monitoring part cuts off,
> limits or redirects the abnormal traffic of the tunnel.

This has to be outside the device though right?
Before traffic arrives at the device.

> > 
> > > 3. or connect the traffic of different tunnels to different network card
> > > ports or network devices.
> > Not sure how this is relevant. These a distinct outer MAC - with this
> > why do we need a tunnel?
> > 
> > > 4..
> > > "


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Heng Qi





在 2023/2/22 下午2:21, Michael S. Tsirkin 写道:

On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote:

The user will figure out how to mitigate when such QoS is not available. Either 
to run in best-effort mode or mitigate differently.

Yes, our cloud security and cloud network team will configure and use inner
hash on dpdk.

Sounds good. More practical for dpdk than Linux.
Is there a chance that when the interface is close
to be final, but before the vote, you post a patch to the dpdk list and
get some acks from the maintainers, cc virtio-dev. This way we won't
merge something that will then go unused?
That would be best - do you have a prototype?


Not yet, dpdk and the business team are waiting for our virtio 
specification, and
they have stated as a business team that their implementation on dpdk 
will not necessarily be open sourced to the community.





In fact I discussed with them the security issues between
tunnels,
and I will quote their solutions to tunnel attacks below, but this is a
problem between the tunnels, not the introduction of inner hash.
I don't think we need to focus too much on this, but I'll do my best to
describe the security issues between tunnels in v10.

"
This is not a problem with the inner hash, it is a general problem with the
outer hash.
I communicated with our people who are doing cloud security (they are also
one of the demanders of inner hash),
and it is a common problem for one tunnel to attack another tunnel.

For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and
the vni id of t1 is id1, and the vni id of v2 is id2; a VM.

At this time, regardless of the inner hash or the outer hash, the traffic of
tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a
single queue or multiple queues),
and may be placed on the same queue to cause queue overflow.

Do note (and explain in spec?) that with just an outer hash and RSS it
is possible to configure the tunnels to use distict queues. Impossible
with this interface but arguably only works for a small number of
tunnels anyway.


# Solutions:

More like mitigations.


Yes, you are right.




1. Some current forwarding tools such as DPDK have good forwarding
performance, and it is difficult to fill up the queue;

Oh that's a good point. If driver is generally faster than the device
and queues stay away from filling up there's no DoS.
I'd add this to the spec.


Ok.




2. or switch the attack traffic to the attack clusters;

What is that?


This is done by the monitoring part outside the tunnel, which is also an 
important mitigation method they mentioned
to prevent DoS between tunnels. For example, the monitoring part cuts 
off, limits or redirects the abnormal traffic of the tunnel.





3. or connect the traffic of different tunnels to different network card
ports or network devices.

Not sure how this is relevant. These a distinct outer MAC - with this
why do we need a tunnel?


4..
"



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Heng Qi


Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:


在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets 
using RSS to
+select queues for placement. When a user inside a tunnel tries to 
control the



What do you mean by "user" here? Is it a remote or local one?



I mean a remote attacker who is not under the control of the tunnel owner.

Thanks.



+enqueuing of encapsulated packets, then the user can flood the 
device with invaild
+packets, and the flooded packets may be hashed into the same queue 
as packets in

+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be 
enqueued due to queue

+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal 
tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue 
information of other normal

+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem that we 
need to solve at spec level:


1) anything make encapsulated packets different or why we can't hit 
this problem without encapsulation


2) whether or not it's the implementation details that the spec 
doesn't need to care (or how it is solved in real NIC)


Thanks



At this point with no ways to mitigate, I don't feel this is something
e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Michael S. Tsirkin

On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote:
> > The user will figure out how to mitigate when such QoS is not available. 
> > Either to run in best-effort mode or mitigate differently.
> 
> Yes, our cloud security and cloud network team will configure and use inner
> hash on dpdk.

Sounds good. More practical for dpdk than Linux.
Is there a chance that when the interface is close
to be final, but before the vote, you post a patch to the dpdk list and
get some acks from the maintainers, cc virtio-dev. This way we won't
merge something that will then go unused?
That would be best - do you have a prototype?

> In fact I discussed with them the security issues between
> tunnels,
> and I will quote their solutions to tunnel attacks below, but this is a
> problem between the tunnels, not the introduction of inner hash.
> I don't think we need to focus too much on this, but I'll do my best to
> describe the security issues between tunnels in v10.
> 
> "
> This is not a problem with the inner hash, it is a general problem with the
> outer hash.
> I communicated with our people who are doing cloud security (they are also
> one of the demanders of inner hash),
> and it is a common problem for one tunnel to attack another tunnel.
> 
> For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and
> the vni id of t1 is id1, and the vni id of v2 is id2; a VM.
> 
> At this time, regardless of the inner hash or the outer hash, the traffic of
> tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a
> single queue or multiple queues),
> and may be placed on the same queue to cause queue overflow.

Do note (and explain in spec?) that with just an outer hash and RSS it
is possible to configure the tunnels to use distict queues. Impossible
with this interface but arguably only works for a small number of
tunnels anyway.

> # Solutions:

More like mitigations.

> 1. Some current forwarding tools such as DPDK have good forwarding
> performance, and it is difficult to fill up the queue;

Oh that's a good point. If driver is generally faster than the device
and queues stay away from filling up there's no DoS.
I'd add this to the spec.

> 2. or switch the attack traffic to the attack clusters;

What is that?

> 3. or connect the traffic of different tunnels to different network card
> ports or network devices.

Not sure how this is relevant. These a distinct outer MAC - with this
why do we need a tunnel?

> 4..
> "

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Jason Wang




在 2023/2/22 01:50, Michael S. Tsirkin 写道:

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:

+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets using RSS to
+select queues for placement. When a user inside a tunnel tries to control the



What do you mean by "user" here? Is it a remote or local one?



+enqueuing of encapsulated packets, then the user can flood the device with 
invaild
+packets, and the flooded packets may be hashed into the same queue as packets 
in
+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be enqueued due to 
queue
+   overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal tunnels are 
extremely increased.
+\item  The user can observe the traffic information and enqueue information of 
other normal
+   tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+

Hmm with this all written out it sounds pretty severe.



I think we need first understand whether or not it's a problem that we 
need to solve at spec level:


1) anything make encapsulated packets different or why we can't hit this 
problem without encapsulation


2) whether or not it's the implementation details that the spec doesn't 
need to care (or how it is solved in real NIC)


Thanks



At this point with no ways to mitigate, I don't feel this is something
e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk.
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?





-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Heng Qi





在 2023/2/22 上午7:18, Michael S. Tsirkin 写道:

On Tue, Feb 21, 2023 at 10:32:11PM +, Parav Pandit wrote:

From: Michael S. Tsirkin 
Sent: Tuesday, February 21, 2023 4:46 PM

What is this information driver can't observe? It sees all the packets after 
all,
we are not stripping tunneling headers.

Just the tunnel type.
If/when that tunnel header is stripped, it gets complicated where tunnel type 
is still present in the virtio_net_hdr because hash_report_tunnel feature bit 
is negotiated.

whoever strips off the tunnel has I imagine strip off the virtio net hdr
too - everything else in it such as gso type refers to the outer packet.


I also don't really know what are upper layer drivers - for sure layering of
drivers is not covered in the spec for now so I am not sure what do you mean by
that.  The risk I mentioned is leaking the information *on the network*.


Got it.





\begin{lstlisting}  struct virtio_net_rss_config {

  le32 hash_types;
+le32 hash_tunnel_types;

This field is not needed as device config space advertisement
for the support

is enough.

If the intent is to enable hashing for the specific tunnel(s),
an individual

command is better.

new command? I am not sure why we want that. why not handle
tunnels like we do other protocols?

I didn't follow.
We probably discussed in another thread that to set M bits, it is
wise to avoid

setting N other bits just to keep the command happy, where N >>> M
and these N have a very strong relation in hw resource setup and packet

steering.

Any examples of 'other protocols'?

#define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
#define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1)
#define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2)

this kind of thing.

I don't see how a tunnel is different fundamentally. Why does it
need its own field?

Driver is in control to enable/disable tunnel based inner hash acceleration

only when its needed.

This way certain data path hw parsers can be enabled/disabled.
Without this it will be always enabled even if there may not be any user of it.
Device has scope to optimize this flow.

I feel you misunderstand the question. Or maybe I misunderstand what you are
proposing.  So tunnels need their own bits. But why a separate field and not 
just
more bits along the existing ones?

Because the hashing is not covering the outer header contents.

We may be still not discussing the same.
So let me refresh the context.

The question of discussion was,
Scenario:
1. device advertises the ability to hash on the inner packet header.
2. device prefers that driver enable it only when it needs to use this extra 
packet parser in hardware.

There are 3 options.
a. Because the feature is negotiated, it means it is enabled for all the tunnel 
types.
Pros:
1. No need to extend cvq cmd.
Cons:
1. device parser is always enabled, and the driver never uses it. This may 
result in inferior rx performance.

b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, 
better not to enable hw for it.
Hence, have the knob to explicitly enable in hw.
So have the cvq command.
b.1 should it be combined with the existing command?
Cons:
a. when the driver wants to enable hash on inner, it needs to supply the exact 
same RSS config as before. Sw overhead with no gain.
b. device needs to parse new command value, compare with old config, and drop 
the RSS config, just enable inner hashing hw parser.
Or destroy the old rss config and re-apply. This results in weird behavior for 
the short interval with no apparent gain.

b.2 should it be on its own command?
Pros:
a. device and driver doesn't need to bother about b.1.a and b.1.b.
b. still benefits from not always enabling hw parser, as this is not a common 
case.
c. has the ability to enable when needed.

I prefer b.1. With reporting of the tunnel type gone I don't see a
fundamental difference between hashing over tunneling types and other
protocol types we support.  It's just a flag telling device over which
bits to calculate the hash. We don't have a separate command for hashing
of TCPv6, why have it for vxlan?  Extending with more HASH_TYPE makes
total sense to me, seems to fit better with the existing design and will
make patch smaller.


+1.

It is infrequent to configure the *tunnel hash types* through commands, 
and when configuring the *hash types*,

the hash key and indirection table are not required too.







-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Parav Pandit




> From: Michael S. Tsirkin 
> Sent: Tuesday, February 21, 2023 6:18 PM

> > The question of discussion was,
> > Scenario:
> > 1. device advertises the ability to hash on the inner packet header.
> > 2. device prefers that driver enable it only when it needs to use this extra
> packet parser in hardware.
> >
> > There are 3 options.
> > a. Because the feature is negotiated, it means it is enabled for all the 
> > tunnel
> types.
> > Pros:
> > 1. No need to extend cvq cmd.
> > Cons:
> > 1. device parser is always enabled, and the driver never uses it. This may
> result in inferior rx performance.
> >
> > b. Since the feature is useful in a narrow case of sw-based vxlan etc 
> > driver,
> better not to enable hw for it.
> > Hence, have the knob to explicitly enable in hw.
> > So have the cvq command.
> > b.1 should it be combined with the existing command?
> > Cons:
> > a. when the driver wants to enable hash on inner, it needs to supply the 
> > exact
> same RSS config as before. Sw overhead with no gain.
> > b. device needs to parse new command value, compare with old config, and
> drop the RSS config, just enable inner hashing hw parser.
> > Or destroy the old rss config and re-apply. This results in weird behavior 
> > for
> the short interval with no apparent gain.
> >
> > b.2 should it be on its own command?
> > Pros:
> > a. device and driver doesn't need to bother about b.1.a and b.1.b.
> > b. still benefits from not always enabling hw parser, as this is not a 
> > common
> case.
> > c. has the ability to enable when needed.
> 
> I prefer b.1. With reporting of the tunnel type gone I don't see a fundamental
> difference between hashing over tunneling types and other protocol types we
> support.  It's just a flag telling device over which bits to calculate the 
> hash. We
> don't have a separate command for hashing of TCPv6, why have it for vxlan?
b.1 to always enable hw for multi-level packet processing is not very optimal 
for actual device implementation.
The difference is one level of header vs second-level hashing.
And new hash type values have zero use of it in sw.

> Extending with more HASH_TYPE makes total sense to me, seems to fit better
> with the existing design and will make patch smaller.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Michael S. Tsirkin

On Tue, Feb 21, 2023 at 10:32:11PM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Tuesday, February 21, 2023 4:46 PM
> > 
> > What is this information driver can't observe? It sees all the packets 
> > after all,
> > we are not stripping tunneling headers.
> Just the tunnel type.
> If/when that tunnel header is stripped, it gets complicated where tunnel type 
> is still present in the virtio_net_hdr because hash_report_tunnel feature bit 
> is negotiated.

whoever strips off the tunnel has I imagine strip off the virtio net hdr
too - everything else in it such as gso type refers to the outer packet.

> > I also don't really know what are upper layer drivers - for sure layering of
> > drivers is not covered in the spec for now so I am not sure what do you 
> > mean by
> > that.  The risk I mentioned is leaking the information *on the network*.
> > 
> Got it.
> 
> > 
> > 
> > 
> > > > > > \begin{lstlisting}  struct virtio_net_rss_config {
> > > > > > > >  le32 hash_types;
> > > > > > > > +le32 hash_tunnel_types;
> > > > > > > This field is not needed as device config space advertisement
> > > > > > > for the support
> > > > > > is enough.
> > > > > > >
> > > > > > > If the intent is to enable hashing for the specific tunnel(s),
> > > > > > > an individual
> > > > > > command is better.
> > > > > >
> > > > > > new command? I am not sure why we want that. why not handle
> > > > > > tunnels like we do other protocols?
> > > > >
> > > > > I didn't follow.
> > > > > We probably discussed in another thread that to set M bits, it is
> > > > > wise to avoid
> > > > setting N other bits just to keep the command happy, where N >>> M
> > > > and these N have a very strong relation in hw resource setup and packet
> > steering.
> > > > > Any examples of 'other protocols'?
> > > >
> > > > #define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
> > > > #define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1)
> > > > #define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2)
> > > >
> > > > this kind of thing.
> > > >
> > > > I don't see how a tunnel is different fundamentally. Why does it
> > > > need its own field?
> > >
> > > Driver is in control to enable/disable tunnel based inner hash 
> > > acceleration
> > only when its needed.
> > > This way certain data path hw parsers can be enabled/disabled.
> > > Without this it will be always enabled even if there may not be any user 
> > > of it.
> > > Device has scope to optimize this flow.
> > 
> > I feel you misunderstand the question. Or maybe I misunderstand what you are
> > proposing.  So tunnels need their own bits. But why a separate field and 
> > not just
> > more bits along the existing ones?
> 
> Because the hashing is not covering the outer header contents.
> 
> We may be still not discussing the same.
> So let me refresh the context.
> 
> The question of discussion was,
> Scenario:
> 1. device advertises the ability to hash on the inner packet header.
> 2. device prefers that driver enable it only when it needs to use this extra 
> packet parser in hardware.
> 
> There are 3 options.
> a. Because the feature is negotiated, it means it is enabled for all the 
> tunnel types.
> Pros:
> 1. No need to extend cvq cmd.
> Cons:
> 1. device parser is always enabled, and the driver never uses it. This may 
> result in inferior rx performance.
> 
> b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, 
> better not to enable hw for it.
> Hence, have the knob to explicitly enable in hw.
> So have the cvq command.
> b.1 should it be combined with the existing command?
> Cons:
> a. when the driver wants to enable hash on inner, it needs to supply the 
> exact same RSS config as before. Sw overhead with no gain.
> b. device needs to parse new command value, compare with old config, and drop 
> the RSS config, just enable inner hashing hw parser.
> Or destroy the old rss config and re-apply. This results in weird behavior 
> for the short interval with no apparent gain.
>
> b.2 should it be on its own command?
> Pros:
> a. device and driver doesn't need to bother about b.1.a and b.1.b.
> b. still benefits from not always enabling hw parser, as this is not a common 
> case.
> c. has the ability to enable when needed.

I prefer b.1. With reporting of the tunnel type gone I don't see a
fundamental difference between hashing over tunneling types and other
protocol types we support.  It's just a flag telling device over which
bits to calculate the hash. We don't have a separate command for hashing
of TCPv6, why have it for vxlan?  Extending with more HASH_TYPE makes
total sense to me, seems to fit better with the existing design and will
make patch smaller.


-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Tuesday, February 21, 2023 4:46 PM
> 
> What is this information driver can't observe? It sees all the packets after 
> all,
> we are not stripping tunneling headers.
Just the tunnel type.
If/when that tunnel header is stripped, it gets complicated where tunnel type 
is still present in the virtio_net_hdr because hash_report_tunnel feature bit 
is negotiated.

> I also don't really know what are upper layer drivers - for sure layering of
> drivers is not covered in the spec for now so I am not sure what do you mean 
> by
> that.  The risk I mentioned is leaking the information *on the network*.
> 
Got it.

> 
> 
> 
> > > > > \begin{lstlisting}  struct virtio_net_rss_config {
> > > > > > >  le32 hash_types;
> > > > > > > +le32 hash_tunnel_types;
> > > > > > This field is not needed as device config space advertisement
> > > > > > for the support
> > > > > is enough.
> > > > > >
> > > > > > If the intent is to enable hashing for the specific tunnel(s),
> > > > > > an individual
> > > > > command is better.
> > > > >
> > > > > new command? I am not sure why we want that. why not handle
> > > > > tunnels like we do other protocols?
> > > >
> > > > I didn't follow.
> > > > We probably discussed in another thread that to set M bits, it is
> > > > wise to avoid
> > > setting N other bits just to keep the command happy, where N >>> M
> > > and these N have a very strong relation in hw resource setup and packet
> steering.
> > > > Any examples of 'other protocols'?
> > >
> > > #define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
> > > #define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1)
> > > #define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2)
> > >
> > > this kind of thing.
> > >
> > > I don't see how a tunnel is different fundamentally. Why does it
> > > need its own field?
> >
> > Driver is in control to enable/disable tunnel based inner hash acceleration
> only when its needed.
> > This way certain data path hw parsers can be enabled/disabled.
> > Without this it will be always enabled even if there may not be any user of 
> > it.
> > Device has scope to optimize this flow.
> 
> I feel you misunderstand the question. Or maybe I misunderstand what you are
> proposing.  So tunnels need their own bits. But why a separate field and not 
> just
> more bits along the existing ones?

Because the hashing is not covering the outer header contents.

We may be still not discussing the same.
So let me refresh the context.

The question of discussion was,
Scenario:
1. device advertises the ability to hash on the inner packet header.
2. device prefers that driver enable it only when it needs to use this extra 
packet parser in hardware.

There are 3 options.
a. Because the feature is negotiated, it means it is enabled for all the tunnel 
types.
Pros:
1. No need to extend cvq cmd.
Cons:
1. device parser is always enabled, and the driver never uses it. This may 
result in inferior rx performance.

b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, 
better not to enable hw for it.
Hence, have the knob to explicitly enable in hw.
So have the cvq command.
b.1 should it be combined with the existing command?
Cons:
a. when the driver wants to enable hash on inner, it needs to supply the exact 
same RSS config as before. Sw overhead with no gain.
b. device needs to parse new command value, compare with old config, and drop 
the RSS config, just enable inner hashing hw parser.
Or destroy the old rss config and re-apply. This results in weird behavior for 
the short interval with no apparent gain.

b.2 should it be on its own command?
Pros:
a. device and driver doesn't need to bother about b.1.a and b.1.b.
b. still benefits from not always enabling hw parser, as this is not a common 
case.
c. has the ability to enable when needed.


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Michael S. Tsirkin

On Tue, Feb 21, 2023 at 09:36:06PM +, Parav Pandit wrote:
> > So you are saying either live with the problem (this is best effort yes?) 
> Yes to best effort usage.

For sure something can be done to mitigate? How about randomizing the
key for example? That's in just like 1 minute of thinking. I am guessing
more can be done.


> > 
> > 
> > > > > > +\item  The user can observe the traffic information and enqueue
> > > > > > +information
> > > > > > of other normal
> > > > > > +   tunnels, and conduct targeted DoS attacks.
> > > > > Once hash_report_tunnel_types is removed, this second attack is no
> > > > > longer
> > > > applicable.
> > > > > Hence, please remove this too.
> > > >
> > > >
> > > > ?
> > > > I don't get how removing a field helps DoS.
> > > >
> > > I meant for the "observe and enqueue" part of the tunnel as not 
> > > applicable.
> > 
> > Sorry still don't get it :( I don't know what is the "observe and enqueue" 
> > part of
> > the tunnel and what is not applicable. But maybe Heng Qi does.
> > 
> Tunnel type such as vxlan/gre etc is not placed in the virtio_net_hdr.
> This way the net_hdr doesn't leak such information to upper layer drivers as 
> it cannot observe it.

What is this information driver can't observe? It sees all the packets
after all, we are not stripping tunneling headers.
I also don't really know what are upper layer drivers - for sure
layering of drivers is not covered in the spec for now so I am not sure
what do you mean by that.  The risk I mentioned is leaking the
information *on the network*.




> > > > \begin{lstlisting}  struct virtio_net_rss_config {
> > > > > >  le32 hash_types;
> > > > > > +le32 hash_tunnel_types;
> > > > > This field is not needed as device config space advertisement for
> > > > > the support
> > > > is enough.
> > > > >
> > > > > If the intent is to enable hashing for the specific tunnel(s), an
> > > > > individual
> > > > command is better.
> > > >
> > > > new command? I am not sure why we want that. why not handle tunnels
> > > > like we do other protocols?
> > >
> > > I didn't follow.
> > > We probably discussed in another thread that to set M bits, it is wise to 
> > > avoid
> > setting N other bits just to keep the command happy, where N >>> M and these
> > N have a very strong relation in hw resource setup and packet steering.
> > > Any examples of 'other protocols'?
> > 
> > #define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
> > #define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1)
> > #define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2)
> > 
> > this kind of thing.
> > 
> > I don't see how a tunnel is different fundamentally. Why does it need its 
> > own
> > field?
> 
> Driver is in control to enable/disable tunnel based inner hash acceleration 
> only when its needed.
> This way certain data path hw parsers can be enabled/disabled.
> Without this it will be always enabled even if there may not be any user of 
> it.
> Device has scope to optimize this flow.

I feel you misunderstand the question. Or maybe I misunderstand what you
are proposing.  So tunnels need their own bits. But why a separate field
and not just more bits along the existing ones?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Tuesday, February 21, 2023 4:24 PM
> 
> On Tue, Feb 21, 2023 at 07:29:20PM +, Parav Pandit wrote:
> > > > When a specific receive queue is shared to receive packets of
> > > > multiple
> > > tunnels, there is no quality of service for packets of multiple tunnels.
> > >
> > > "shared to receive" is not grammatical either :)
> > >
> > "Shared by multiple tunnels" will make it grammatical?
> 
> I think so, yes.
> 
> > > If you are talking about a security risk you need to explain
> > > 1- what is the threat, what configurations are affected.
> > > 2- what is the attack type: DOS, information leak, etc.
> > > 3- how to mitigate it
> > >
> > > This text touches a bit on 1 and 2 but not in an ordererly way.
> > >
> > >
> > it is best effort based.
> >
> > #3 is outside the scope of this patch set.
> 
> Scope is from greek for "target". It's what we are aiming for.
> If we document a security risk then I would say yes we should aim to provide
> not just problems but solutions too.
> 
> > > > +
> > > > > +This can pose several security risks:
> > > > > +\begin{itemize}
> > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > +enqueued due to
> > > > > queue
> > > > > +   overflow, resulting in a large amount of packet loss.
> > > > > +\item  The delay and retransmission of packets in the normal
> > > > > +tunnels are
> > > > > extremely increased.
> > > > This is something very protocol specific and doesn't belong here.
> > >
> > > I don't see how it's specific - many protocols have retransmission
> > > and are affected by delays. "extremely increased" sounds unrammatical to
> me though.
> > >
> > >
> > I am not sure where you want to lead this discussion.
> 
> I just disagree that documenting timing effects does not belong in the spec.
> 
> > I am trying to help the spec and feature definition to be compact enough to
> progress.
> >
> > It is specific to a protocol(s) and somehow arbitrarily concluded with a 
> > large
> number of packet losses.
> > Maybe only one ICMP packet got dropped and retransmit was just one
> packet.
> > Maybe it was TCP with selective retransmit enabled/disabled.
> >
> > As far as receive side is concerned, it should say that there is no QoS 
> > among
> different tunnels.
> > The user will figure out how to mitigate when such QoS is not available.
> Either to run in best-effort mode or mitigate differently.
> 
> So you are saying either live with the problem (this is best effort yes?) 
Yes to best effort usage.

> 
> 
> > > > > +\item  The user can observe the traffic information and enqueue
> > > > > +information
> > > > > of other normal
> > > > > +   tunnels, and conduct targeted DoS attacks.
> > > > Once hash_report_tunnel_types is removed, this second attack is no
> > > > longer
> > > applicable.
> > > > Hence, please remove this too.
> > >
> > >
> > > ?
> > > I don't get how removing a field helps DoS.
> > >
> > I meant for the "observe and enqueue" part of the tunnel as not applicable.
> 
> Sorry still don't get it :( I don't know what is the "observe and enqueue" 
> part of
> the tunnel and what is not applicable. But maybe Heng Qi does.
> 
Tunnel type such as vxlan/gre etc is not placed in the virtio_net_hdr.
This way the net_hdr doesn't leak such information to upper layer drivers as it 
cannot observe it.

> > > \begin{lstlisting}  struct virtio_net_rss_config {
> > > > >  le32 hash_types;
> > > > > +le32 hash_tunnel_types;
> > > > This field is not needed as device config space advertisement for
> > > > the support
> > > is enough.
> > > >
> > > > If the intent is to enable hashing for the specific tunnel(s), an
> > > > individual
> > > command is better.
> > >
> > > new command? I am not sure why we want that. why not handle tunnels
> > > like we do other protocols?
> >
> > I didn't follow.
> > We probably discussed in another thread that to set M bits, it is wise to 
> > avoid
> setting N other bits just to keep the command happy, where N >>> M and these
> N have a very strong relation in hw resource setup and packet steering.
> > Any examples of 'other protocols'?
> 
> #define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
> #define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1)
> #define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2)
> 
> this kind of thing.
> 
> I don't see how a tunnel is different fundamentally. Why does it need its own
> field?

Driver is in control to enable/disable tunnel based inner hash acceleration 
only when its needed.
This way certain data path hw parsers can be enabled/disabled.
Without this it will be always enabled even if there may not be any user of it.
Device has scope to optimize this flow.

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Michael S. Tsirkin

On Tue, Feb 21, 2023 at 07:29:20PM +, Parav Pandit wrote:
> > > When a specific receive queue is shared to receive packets of multiple
> > tunnels, there is no quality of service for packets of multiple tunnels.
> > 
> > "shared to receive" is not grammatical either :)
> > 
> "Shared by multiple tunnels" will make it grammatical?

I think so, yes.

> > If you are talking about a security risk you need to explain
> > 1- what is the threat, what configurations are affected.
> > 2- what is the attack type: DOS, information leak, etc.
> > 3- how to mitigate it
> > 
> > This text touches a bit on 1 and 2 but not in an ordererly way.
> > 
> > 
> it is best effort based.
> 
> #3 is outside the scope of this patch set.

Scope is from greek for "target". It's what we are aiming for.
If we document a security risk then I would say yes we should aim
to provide not just problems but solutions too.

> > > +
> > > > +This can pose several security risks:
> > > > +\begin{itemize}
> > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > +enqueued due to
> > > > queue
> > > > +   overflow, resulting in a large amount of packet loss.
> > > > +\item  The delay and retransmission of packets in the normal
> > > > +tunnels are
> > > > extremely increased.
> > > This is something very protocol specific and doesn't belong here.
> > 
> > I don't see how it's specific - many protocols have retransmission and are
> > affected by delays. "extremely increased" sounds unrammatical to me though.
> > 
> > 
> I am not sure where you want to lead this discussion.

I just disagree that documenting timing effects does not belong in the
spec.

> I am trying to help the spec and feature definition to be compact enough to 
> progress.
> 
> It is specific to a protocol(s) and somehow arbitrarily concluded with a 
> large number of packet losses.
> Maybe only one ICMP packet got dropped and retransmit was just one packet.
> Maybe it was TCP with selective retransmit enabled/disabled.
> 
> As far as receive side is concerned, it should say that there is no QoS among 
> different tunnels.
> The user will figure out how to mitigate when such QoS is not available. 
> Either to run in best-effort mode or mitigate differently.

So you are saying either live with the problem (this is best effort yes?)
or find your own solutions? Such as?


> > > > +\item  The user can observe the traffic information and enqueue
> > > > +information
> > > > of other normal
> > > > +   tunnels, and conduct targeted DoS attacks.
> > > Once hash_report_tunnel_types is removed, this second attack is no longer
> > applicable.
> > > Hence, please remove this too.
> > 
> > 
> > ?
> > I don't get how removing a field helps DoS.
> > 
> I meant for the "observe and enqueue" part of the tunnel as not applicable. 

Sorry still don't get it :( I don't know what is the "observe and enqueue" part 
of the tunnel
and what is not applicable. But maybe Heng Qi does.

> > \begin{lstlisting}  struct virtio_net_rss_config {
> > > >  le32 hash_types;
> > > > +le32 hash_tunnel_types;
> > > This field is not needed as device config space advertisement for the 
> > > support
> > is enough.
> > >
> > > If the intent is to enable hashing for the specific tunnel(s), an 
> > > individual
> > command is better.
> > 
> > new command? I am not sure why we want that. why not handle tunnels like
> > we do other protocols?
> 
> I didn't follow.
> We probably discussed in another thread that to set M bits, it is wise to 
> avoid setting N other bits just to keep the command happy, where N >>> M and 
> these N have a very strong relation in hw resource setup and packet steering.
> Any examples of 'other protocols'?

#define VIRTIO_NET_HASH_TYPE_IPv4  (1 << 0)
#define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1)
#define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2)

this kind of thing.

I don't see how a tunnel is different fundamentally. Why does it need
its own field?

-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Parav Pandit




> From: Michael S. Tsirkin 
> Sent: Tuesday, February 21, 2023 12:06 PM
> 
> On Tue, Feb 21, 2023 at 04:20:59AM +, Parav Pandit wrote:
> >
> > > From: Heng Qi 
> > > Sent: Saturday, February 18, 2023 9:37 AM
> >
> > > If the tunnel is used to encapsulate the packets, the hash
> > > calculated using the
> > s/hash calculated/hash is calculated
> >
> > > outer header of the receive packets is always fixed for the same
> > > flow packets, i.e. they will be steered to the same receive queue.
> > >
> > A little descriptive commit message like below reads better to me.
> > Currently, when a received packet is an encapsulated packet meaning there
> is an outer and an inner header, virtio device is unable to calculate the 
> hash for
> the inner header.
> > Due to this limitation, multiple different flows identified by the inner 
> > header
> for the same outer header result in selecting the same receive queue.
> > This effectively disables the RSS, resulting in poor receive performance.
> >
> > Hence, to overcome this limitation, a new feature is introduced using a
> feature bit VIRTIO_NET_F_HASH_TUNNEL.
> > This feature enables the device to advertise the capability to calculate the
> hash for the inner packet header.
> > Thereby regaining better RSS performance in presence of outer packet
> header.
> 
> I think this is a good description however Parav I think it is important to 
> make
> contributors write their own commit messages so they know what is the reason
> for the proposed change. 
Sure. Contributor can rewrite it.

> What's good for the goose is good for the gander -
> contributors should explain why their change to spec is benefitial but 
> reviewers
> should also explain why their changes to the patch are benefitial, and "reads
> better to me" does not cut it - it does not allow the contributor to improve 
> with
> time.  It's more than about a single contribution, see?
> 
I provided an example template on how to write problem_description -> solution 
commit log.

At the beginning, I said "little descriptive" to explain why it reads better.
But it seems, even more verbosity is needed even for the reviewer to suggest.
I didn't see this often happening by other reviewers, but I make a note of it 
now, at least I can improve from this feedback.

I imagined that the contributor would see the pattern as problem->solution in 
the example commit log and follow in the future patches.
Giving example of current patch was best to describe how to write it.

> In this case I would say the issue is that motivation for the change is never
> explained.

> > When a specific receive queue is shared to receive packets of multiple
> tunnels, there is no quality of service for packets of multiple tunnels.
> 
> "shared to receive" is not grammatical either :)
> 
"Shared by multiple tunnels" will make it grammatical?

> If you are talking about a security risk you need to explain
> 1- what is the threat, what configurations are affected.
> 2- what is the attack type: DOS, information leak, etc.
> 3- how to mitigate it
> 
> This text touches a bit on 1 and 2 but not in an ordererly way.
> 
> 
it is best effort based.

#3 is outside the scope of this patch set.

> > +
> > > +This can pose several security risks:
> > > +\begin{itemize}
> > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > +enqueued due to
> > > queue
> > > +   overflow, resulting in a large amount of packet loss.
> > > +\item  The delay and retransmission of packets in the normal
> > > +tunnels are
> > > extremely increased.
> > This is something very protocol specific and doesn't belong here.
> 
> I don't see how it's specific - many protocols have retransmission and are
> affected by delays. "extremely increased" sounds unrammatical to me though.
> 
> 
I am not sure where you want to lead this discussion.
I am trying to help the spec and feature definition to be compact enough to 
progress.

It is specific to a protocol(s) and somehow arbitrarily concluded with a large 
number of packet losses.
Maybe only one ICMP packet got dropped and retransmit was just one packet.
Maybe it was TCP with selective retransmit enabled/disabled.

As far as receive side is concerned, it should say that there is no QoS among 
different tunnels.
The user will figure out how to mitigate when such QoS is not available. Either 
to run in best-effort mode or mitigate differently.

> > > +\item  The user can observe the traffic information and enqueue
> > > +information
> > > of other normal
> > > +   tunnels, and conduct targeted DoS attacks.
> > Once hash_report_tunnel_types is removed, this second attack is no longer
> applicable.
> > Hence, please remove this too.
> 
> 
> ?
> I don't get how removing a field helps DoS.
> 
I meant for the "observe and enqueue" part of the tunnel as not applicable. 

> \begin{lstlisting}  struct virtio_net_rss_config {
> > >  le32 hash_types;
> > > +le32 hash_tunnel_types;
> > This field is not needed as

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Michael S. Tsirkin

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> +\subparagraph{Security risks between encapsulated packets and RSS}
> +There may be potential security risks when encapsulated packets using RSS to
> +select queues for placement. When a user inside a tunnel tries to control the
> +enqueuing of encapsulated packets, then the user can flood the device with 
> invaild
> +packets, and the flooded packets may be hashed into the same queue as 
> packets in
> +other normal tunnels, which causing the queue to overflow.
> +
> +This can pose several security risks:
> +\begin{itemize}
> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to 
> queue
> +   overflow, resulting in a large amount of packet loss.
> +\item  The delay and retransmission of packets in the normal tunnels are 
> extremely increased.
> +\item  The user can observe the traffic information and enqueue information 
> of other normal
> +   tunnels, and conduct targeted DoS attacks.
> +\end{\itemize}
> +

Hmm with this all written out it sounds pretty severe.
At this point with no ways to mitigate, I don't feel this is something
e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk. 
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?


-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

[virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash

2023-02-21 Thread Michael S. Tsirkin

On Tue, Feb 21, 2023 at 04:20:59AM +, Parav Pandit wrote:
> 
> > From: Heng Qi 
> > Sent: Saturday, February 18, 2023 9:37 AM
> 
> > If the tunnel is used to encapsulate the packets, the hash calculated using 
> > the
> s/hash calculated/hash is calculated
> 
> > outer header of the receive packets is always fixed for the same flow 
> > packets,
> > i.e. they will be steered to the same receive queue.
> > 
> A little descriptive commit message like below reads better to me.
> Currently, when a received packet is an encapsulated packet meaning there is 
> an outer and an inner header, virtio device is unable to calculate the hash 
> for the inner header.
> Due to this limitation, multiple different flows identified by the inner 
> header for the same outer header result in selecting the same receive queue.
> This effectively disables the RSS, resulting in poor receive performance.
> 
> Hence, to overcome this limitation, a new feature is introduced using a 
> feature bit VIRTIO_NET_F_HASH_TUNNEL.
> This feature enables the device to advertise the capability to calculate the 
> hash for the inner packet header.
> Thereby regaining better RSS performance in presence of outer packet header.

I think this is a good description however Parav I think it is important
to make contributors write their own commit messages so they know what
is the reason for the proposed change. What's good for the goose is good
for the gander - contributors should explain why their change to spec is
benefitial but reviewers should also explain why their changes to the
patch are benefitial, and "reads better to me" does not cut it - it does
not allow the contributor to improve with time.  It's more than about a
single contribution, see?

In this case I would say the issue is that motivation for the
change is never explained.

I am yet to review the patchset.

> 
> > We add a feature bit VIRTIO_NET_F_HASH_TUNNEL and related bitmasks in
> > \field{hash_tunnel_types}, which instructs the device to calculate the hash
> > using the inner headers of tunnel-encapsulated packets. Note that
> > VIRTIO_NET_F_HASH_TUNNEL only indicates the ability of the inner header
> > hash, and does not give the device the ability to use the hash value to 
> > select a
> > receiving queue to place the packet.
> > 
> > Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report
> > an encapsulation type, and the feature depends on
> > VIRTIO_NET_F_HASH_REPORT.
> 
> As we discussed that tunnel type alone is not useful the sw, neither as an 
> individual field nor merged with some other field.
> Hence, please remove this feature bit. HASH_TUNNEL is good enough.
> Please remove the references to it at more places below.
> 
> > It only means that the encapsulation type can be reported, it cannot 
> > instruct
> > the device to calculate the hash.
> > 
> 
> > 
> > +\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash
> > +   for tunnel-encapsulated packets.
> > +
> > +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL(52)] Device can report an
> > encapsulation type.
> > +
> Please remove this.
> 
> >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications
> > coalescing.
> > 
> >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > @@ -3140,6 +3145,8 @@ \subsubsection{Feature bit
> > requirements}\label{sec:Device Types / Network Device
> > \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or
> > VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL] Requires
> > VIRTIO_NET_F_HASH_REPORT.
> >  \end{description}
> > 
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types /
> > Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3199,20
> > +3206,27 @@ \subsection{Device configuration layout}\label{sec:Device Types
> > / Network Device
> >  u8 rss_max_key_size;
> >  le16 rss_max_indirection_table_length;
> >  le32 supported_hash_types;
> > +le32 supported_tunnel_hash_types;
> >  };
> >  \end{lstlisting}
> > -The following field, \field{rss_max_key_size} only exists if 
> > VIRTIO_NET_F_RSS
> > or VIRTIO_NET_F_HASH_REPORT is set.
> > +The following field, \field{rss_max_key_size} only exists if 
> > VIRTIO_NET_F_RSS,
> > VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
> >  It specifies the maximum supported length of RSS key in bytes.
> > 
> >  The following field, \field{rss_max_indirection_table_length} only exists 
> > if
> > VIRTIO_NET_F_RSS is set.
> >  It specifies the maximum number of 16-bit entries in RSS indirection table.
> > 
> >  The next field, \field{supported_hash_types} only exists if the device 
> > supports
> > hash calculation, -i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is
> > set.
> > +i.e.

[virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash

2023-02-20 Thread Parav Pandit



> From: Heng Qi 
> Sent: Saturday, February 18, 2023 9:37 AM

> If the tunnel is used to encapsulate the packets, the hash calculated using 
> the
s/hash calculated/hash is calculated

> outer header of the receive packets is always fixed for the same flow packets,
> i.e. they will be steered to the same receive queue.
> 
A little descriptive commit message like below reads better to me.

Currently, when a received packet is an encapsulated packet meaning there is an 
outer and an inner header, virtio device is unable to calculate the hash for 
the inner header.
Due to this limitation, multiple different flows identified by the inner header 
for the same outer header result in selecting the same receive queue.
This effectively disables the RSS, resulting in poor receive performance.

Hence, to overcome this limitation, a new feature is introduced using a feature 
bit VIRTIO_NET_F_HASH_TUNNEL.
This feature enables the device to advertise the capability to calculate the 
hash for the inner packet header.
Thereby regaining better RSS performance in presence of outer packet header.

> We add a feature bit VIRTIO_NET_F_HASH_TUNNEL and related bitmasks in
> \field{hash_tunnel_types}, which instructs the device to calculate the hash
> using the inner headers of tunnel-encapsulated packets. Note that
> VIRTIO_NET_F_HASH_TUNNEL only indicates the ability of the inner header
> hash, and does not give the device the ability to use the hash value to 
> select a
> receiving queue to place the packet.
> 
> Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report
> an encapsulation type, and the feature depends on
> VIRTIO_NET_F_HASH_REPORT.

As we discussed that tunnel type alone is not useful the sw, neither as an 
individual field nor merged with some other field.
Hence, please remove this feature bit. HASH_TUNNEL is good enough.
Please remove the references to it at more places below.

> It only means that the encapsulation type can be reported, it cannot instruct
> the device to calculate the hash.
> 

> 
> +\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash
> + for tunnel-encapsulated packets.
> +
> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL(52)] Device can report an
> encapsulation type.
> +
Please remove this.

>  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications
> coalescing.
> 
>  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3145,8 @@ \subsubsection{Feature bit
> requirements}\label{sec:Device Types / Network Device
> \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or
> VIRTIO_NET_F_HOST_TSO6.
>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL] Requires
> VIRTIO_NET_F_HASH_REPORT.
>  \end{description}
> 
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types /
> Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3199,20
> +3206,27 @@ \subsection{Device configuration layout}\label{sec:Device Types
> / Network Device
>  u8 rss_max_key_size;
>  le16 rss_max_indirection_table_length;
>  le32 supported_hash_types;
> +le32 supported_tunnel_hash_types;
>  };
>  \end{lstlisting}
> -The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS
> or VIRTIO_NET_F_HASH_REPORT is set.
> +The following field, \field{rss_max_key_size} only exists if 
> VIRTIO_NET_F_RSS,
> VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
>  It specifies the maximum supported length of RSS key in bytes.
> 
>  The following field, \field{rss_max_indirection_table_length} only exists if
> VIRTIO_NET_F_RSS is set.
>  It specifies the maximum number of 16-bit entries in RSS indirection table.
> 
>  The next field, \field{supported_hash_types} only exists if the device 
> supports
> hash calculation, -i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is
> set.
> +i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
> VIRTIO_NET_F_HASH_TUNNEL is set.
> 
>  Field \field{supported_hash_types} contains the bitmask of supported hash
> types.
>  See \ref{sec:Device Types / Network Device / Device Operation / Processing of
> Incoming Packets / Hash calculation for incoming packets / Supported/enabled
> hash types} for details of supported hash types.
> 
> +The next field, \field{supported_tunnel_hash_types} only exists if the
> +device supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is
> set.
> +
> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported
> tunnel hash types.
> +See \ref{sec:Device Types / Network Device / Device Operation / Processing
> of Incoming Packets / Hash calculation for incoming packets /
> Supported/enabled tunnel hash types} for details of supported tunnel hash
> types.
> +
>  \devicenormative{\subsubsection}{Device

68 matches

Mail list logo