Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port same goes for vxlan did not check further. so what is the problem? and which tunnel types actually suffer from the problem? Inner hash can at least hash tunnel flows without outer transport headers like GRE to multiple queues, which is beneficial to us. For tunnel flows with outer transport headers like VXLAN, although they can hash flows to different queues by setting different outer udp port, this does not conflict with inner hash. Inner hashing can also be used for this purpose. For the same flow, packets in the receiving and sending directions may pass through different tunnels respectively, which cause the same flow to be hashed to different queues. In this case, we have to calculate a symmetric hash (can be called an inner symmetric hash, which is a type of inner hash.) through the inner header, so that the same flow can be hashed to the same queue. Symmetric hashing can ignore the order of the 5-tuples to calculate the hash, that is, the hash values calculated by (a1, a2, a3, a4) and (a2, a1, a4, a3) respectively are the same. Thanks. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/22 上午7:18, Michael S. Tsirkin 写道: On Tue, Feb 21, 2023 at 10:32:11PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Tuesday, February 21, 2023 4:46 PM What is this information driver can't observe? It sees all the packets after all, we are not stripping tunneling headers. Just the tunnel type. If/when that tunnel header is stripped, it gets complicated where tunnel type is still present in the virtio_net_hdr because hash_report_tunnel feature bit is negotiated. whoever strips off the tunnel has I imagine strip off the virtio net hdr too - everything else in it such as gso type refers to the outer packet. I also don't really know what are upper layer drivers - for sure layering of drivers is not covered in the spec for now so I am not sure what do you mean by that. The risk I mentioned is leaking the information *on the network*. Got it. \begin{lstlisting} struct virtio_net_rss_config { le32 hash_types; +le32 hash_tunnel_types; This field is not needed as device config space advertisement for the support is enough. If the intent is to enable hashing for the specific tunnel(s), an individual command is better. new command? I am not sure why we want that. why not handle tunnels like we do other protocols? I didn't follow. We probably discussed in another thread that to set M bits, it is wise to avoid setting N other bits just to keep the command happy, where N >>> M and these N have a very strong relation in hw resource setup and packet steering. Any examples of 'other protocols'? #define VIRTIO_NET_HASH_TYPE_IPv4 (1 << 0) #define VIRTIO_NET_HASH_TYPE_TCPv4 (1 << 1) #define VIRTIO_NET_HASH_TYPE_UDPv4 (1 << 2) this kind of thing. I don't see how a tunnel is different fundamentally. Why does it need its own field? Driver is in control to enable/disable tunnel based inner hash acceleration only when its needed. This way certain data path hw parsers can be enabled/disabled. Without this it will be always enabled even if there may not be any user of it. Device has scope to optimize this flow. I feel you misunderstand the question. Or maybe I misunderstand what you are proposing. So tunnels need their own bits. But why a separate field and not just more bits along the existing ones? Because the hashing is not covering the outer header contents. We may be still not discussing the same. So let me refresh the context. The question of discussion was, Scenario: 1. device advertises the ability to hash on the inner packet header. 2. device prefers that driver enable it only when it needs to use this extra packet parser in hardware. There are 3 options. a. Because the feature is negotiated, it means it is enabled for all the tunnel types. Pros: 1. No need to extend cvq cmd. Cons: 1. device parser is always enabled, and the driver never uses it. This may result in inferior rx performance. b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, better not to enable hw for it. Hence, have the knob to explicitly enable in hw. So have the cvq command. b.1 should it be combined with the existing command? Cons: a. when the driver wants to enable hash on inner, it needs to supply the exact same RSS config as before. Sw overhead with no gain. b. device needs to parse new command value, compare with old config, and drop the RSS config, just enable inner hashing hw parser. Or destroy the old rss config and re-apply. This results in weird behavior for the short interval with no apparent gain. b.2 should it be on its own command? Pros: a. device and driver doesn't need to bother about b.1.a and b.1.b. b. still benefits from not always enabling hw parser, as this is not a common case. c. has the ability to enable when needed. I prefer b.1. With reporting of the tunnel type gone I don't see a fundamental difference between hashing over tunneling types and other protocol types we support. It's just a flag telling device over which bits to calculate the hash. We don't have a separate command for hashing of TCPv6, why have it for vxlan? Extending with more HASH_TYPE makes total sense to me, seems to fit better with the existing design and will make patch smaller. +1. It is infrequent to configure the *tunnel hash types* through commands, and when configuring the *hash types*, the hash key and indirection table are not required too. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote: > > The user will figure out how to mitigate when such QoS is not available. > > Either to run in best-effort mode or mitigate differently. > > Yes, our cloud security and cloud network team will configure and use inner > hash on dpdk. Sounds good. More practical for dpdk than Linux. Is there a chance that when the interface is close to be final, but before the vote, you post a patch to the dpdk list and get some acks from the maintainers, cc virtio-dev. This way we won't merge something that will then go unused? That would be best - do you have a prototype? > In fact I discussed with them the security issues between > tunnels, > and I will quote their solutions to tunnel attacks below, but this is a > problem between the tunnels, not the introduction of inner hash. > I don't think we need to focus too much on this, but I'll do my best to > describe the security issues between tunnels in v10. > > " > This is not a problem with the inner hash, it is a general problem with the > outer hash. > I communicated with our people who are doing cloud security (they are also > one of the demanders of inner hash), > and it is a common problem for one tunnel to attack another tunnel. > > For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and > the vni id of t1 is id1, and the vni id of v2 is id2; a VM. > > At this time, regardless of the inner hash or the outer hash, the traffic of > tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a > single queue or multiple queues), > and may be placed on the same queue to cause queue overflow. Do note (and explain in spec?) that with just an outer hash and RSS it is possible to configure the tunnels to use distict queues. Impossible with this interface but arguably only works for a small number of tunnels anyway. > # Solutions: More like mitigations. > 1. Some current forwarding tools such as DPDK have good forwarding > performance, and it is difficult to fill up the queue; Oh that's a good point. If driver is generally faster than the device and queues stay away from filling up there's no DoS. I'd add this to the spec. > 2. or switch the attack traffic to the attack clusters; What is that? > 3. or connect the traffic of different tunnels to different network card > ports or network devices. Not sure how this is relevant. These a distinct outer MAC - with this why do we need a tunnel? > 4.. > " - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
在 2023/2/22 下午2:21, Michael S. Tsirkin 写道: On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote: The user will figure out how to mitigate when such QoS is not available. Either to run in best-effort mode or mitigate differently. Yes, our cloud security and cloud network team will configure and use inner hash on dpdk. Sounds good. More practical for dpdk than Linux. Is there a chance that when the interface is close to be final, but before the vote, you post a patch to the dpdk list and get some acks from the maintainers, cc virtio-dev. This way we won't merge something that will then go unused? That would be best - do you have a prototype? Not yet, dpdk and the business team are waiting for our virtio specification, and they have stated as a business team that their implementation on dpdk will not necessarily be open sourced to the community.😅 In fact I discussed with them the security issues between tunnels, and I will quote their solutions to tunnel attacks below, but this is a problem between the tunnels, not the introduction of inner hash. I don't think we need to focus too much on this, but I'll do my best to describe the security issues between tunnels in v10. " This is not a problem with the inner hash, it is a general problem with the outer hash. I communicated with our people who are doing cloud security (they are also one of the demanders of inner hash), and it is a common problem for one tunnel to attack another tunnel. For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and the vni id of t1 is id1, and the vni id of v2 is id2; a VM. At this time, regardless of the inner hash or the outer hash, the traffic of tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a single queue or multiple queues), and may be placed on the same queue to cause queue overflow. Do note (and explain in spec?) that with just an outer hash and RSS it is possible to configure the tunnels to use distict queues. Impossible with this interface but arguably only works for a small number of tunnels anyway. # Solutions: More like mitigations. Yes, you are right. 1. Some current forwarding tools such as DPDK have good forwarding performance, and it is difficult to fill up the queue; Oh that's a good point. If driver is generally faster than the device and queues stay away from filling up there's no DoS. I'd add this to the spec. Ok. 2. or switch the attack traffic to the attack clusters; What is that? This is done by the monitoring part outside the tunnel, which is also an important mitigation method they mentioned to prevent DoS between tunnels. For example, the monitoring part cuts off, limits or redirects the abnormal traffic of the tunnel. 3. or connect the traffic of different tunnels to different network card ports or network devices. Not sure how this is relevant. These a distinct outer MAC - with this why do we need a tunnel? 4.. " - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
On Wed, Feb 22, 2023 at 03:03:32PM +0800, Heng Qi wrote: > > > 在 2023/2/22 下午2:21, Michael S. Tsirkin 写道: > > On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote: > > > > The user will figure out how to mitigate when such QoS is not > > > > available. Either to run in best-effort mode or mitigate differently. > > > Yes, our cloud security and cloud network team will configure and use > > > inner > > > hash on dpdk. > > Sounds good. More practical for dpdk than Linux. > > Is there a chance that when the interface is close > > to be final, but before the vote, you post a patch to the dpdk list and > > get some acks from the maintainers, cc virtio-dev. This way we won't > > merge something that will then go unused? > > That would be best - do you have a prototype? > > Not yet, dpdk and the business team are waiting for our virtio > specification, and > they have stated as a business team that their implementation on dpdk will > not necessarily be open sourced to the community.😅 Ugh so no open source implementations at all :( > > > > > In fact I discussed with them the security issues between > > > tunnels, > > > and I will quote their solutions to tunnel attacks below, but this is a > > > problem between the tunnels, not the introduction of inner hash. > > > I don't think we need to focus too much on this, but I'll do my best to > > > describe the security issues between tunnels in v10. > > > > > > " > > > This is not a problem with the inner hash, it is a general problem with > > > the > > > outer hash. > > > I communicated with our people who are doing cloud security (they are also > > > one of the demanders of inner hash), > > > and it is a common problem for one tunnel to attack another tunnel. > > > > > > For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, > > > and > > > the vni id of t1 is id1, and the vni id of v2 is id2; a VM. > > > > > > At this time, regardless of the inner hash or the outer hash, the traffic > > > of > > > tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a > > > single queue or multiple queues), > > > and may be placed on the same queue to cause queue overflow. > > Do note (and explain in spec?) that with just an outer hash and RSS it > > is possible to configure the tunnels to use distict queues. Impossible > > with this interface but arguably only works for a small number of > > tunnels anyway. > > > > > # Solutions: > > More like mitigations. > > Yes, you are right. > > > > > > 1. Some current forwarding tools such as DPDK have good forwarding > > > performance, and it is difficult to fill up the queue; > > Oh that's a good point. If driver is generally faster than the device > > and queues stay away from filling up there's no DoS. > > I'd add this to the spec. > > Ok. > > > > > > 2. or switch the attack traffic to the attack clusters; > > What is that? > > This is done by the monitoring part outside the tunnel, which is also an > important mitigation method they mentioned > to prevent DoS between tunnels. For example, the monitoring part cuts off, > limits or redirects the abnormal traffic of the tunnel. This has to be outside the device though right? Before traffic arrives at the device. > > > > > 3. or connect the traffic of different tunnels to different network card > > > ports or network devices. > > Not sure how this is relevant. These a distinct outer MAC - with this > > why do we need a tunnel? > > > > > 4.. > > > " - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/23 上午10:50, Jason Wang 写道: Hi: 在 2023/2/22 14:46, Heng Qi 写道: Hi, Jason. Long time no see. :) 在 2023/2/22 上午11:22, Jason Wang 写道: 在 2023/2/22 01:50, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: +\subparagraph{Security risks between encapsulated packets and RSS} +There may be potential security risks when encapsulated packets using RSS to +select queues for placement. When a user inside a tunnel tries to control the What do you mean by "user" here? Is it a remote or local one? I mean a remote attacker who is not under the control of the tunnel owner. Anything may the tunnel different? I think this can happen even without tunnel (and even with single queue). I agree. How to mitigate those attackers seems more like a implementation details where might require fair queuing or other QOS technology which has been well studied. I am also not sure whether this point needs to be focused on in the spec, and I see that the protection against tunnel DoS is more protected outside the device, but it seems to be okay to give some attack reminders. Thanks. It seems out of the scope of the spec (unless we want to let driver manageable QOS). Thanks Thanks. +enqueuing of encapsulated packets, then the user can flood the device with invaild +packets, and the flooded packets may be hashed into the same queue as packets in +other normal tunnels, which causing the queue to overflow. + +This can pose several security risks: +\begin{itemize} +\item Encapsulated packets in the normal tunnels cannot be enqueued due to queue + overflow, resulting in a large amount of packet loss. +\item The delay and retransmission of packets in the normal tunnels are extremely increased. +\item The user can observe the traffic information and enqueue information of other normal + tunnels, and conduct targeted DoS attacks. +\end{\itemize} + Hmm with this all written out it sounds pretty severe. I think we need first understand whether or not it's a problem that we need to solve at spec level: 1) anything make encapsulated packets different or why we can't hit this problem without encapsulation 2) whether or not it's the implementation details that the spec doesn't need to care (or how it is solved in real NIC) Thanks At this point with no ways to mitigate, I don't feel this is something e.g. Linux can enable. I am not going to nack the spec patch if others find this somehow useful e.g. for dpdk. How about CC e.g. dpdk devs or whoever else is going to use this and asking them for the opinion? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/23 12:41, Heng Qi 写道: 在 2023/2/23 上午10:50, Jason Wang 写道: Hi: 在 2023/2/22 14:46, Heng Qi 写道: Hi, Jason. Long time no see. :) 在 2023/2/22 上午11:22, Jason Wang 写道: 在 2023/2/22 01:50, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: +\subparagraph{Security risks between encapsulated packets and RSS} +There may be potential security risks when encapsulated packets using RSS to +select queues for placement. When a user inside a tunnel tries to control the What do you mean by "user" here? Is it a remote or local one? I mean a remote attacker who is not under the control of the tunnel owner. Anything may the tunnel different? I think this can happen even without tunnel (and even with single queue). I agree. How to mitigate those attackers seems more like a implementation details where might require fair queuing or other QOS technology which has been well studied. I am also not sure whether this point needs to be focused on in the spec, and I see that the protection against tunnel DoS is more protected outside the device, but it seems to be okay to give some attack reminders. Maybe it's sufficient to say the device should make sure the fairness among different flows when queuing packets? Thanks Thanks. It seems out of the scope of the spec (unless we want to let driver manageable QOS). Thanks Thanks. +enqueuing of encapsulated packets, then the user can flood the device with invaild +packets, and the flooded packets may be hashed into the same queue as packets in +other normal tunnels, which causing the queue to overflow. + +This can pose several security risks: +\begin{itemize} +\item Encapsulated packets in the normal tunnels cannot be enqueued due to queue + overflow, resulting in a large amount of packet loss. +\item The delay and retransmission of packets in the normal tunnels are extremely increased. +\item The user can observe the traffic information and enqueue information of other normal + tunnels, and conduct targeted DoS attacks. +\end{\itemize} + Hmm with this all written out it sounds pretty severe. I think we need first understand whether or not it's a problem that we need to solve at spec level: 1) anything make encapsulated packets different or why we can't hit this problem without encapsulation 2) whether or not it's the implementation details that the spec doesn't need to care (or how it is solved in real NIC) Thanks At this point with no ways to mitigate, I don't feel this is something e.g. Linux can enable. I am not going to nack the spec patch if others find this somehow useful e.g. for dpdk. How about CC e.g. dpdk devs or whoever else is going to use this and asking them for the opinion? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Fri, Feb 24, 2023 at 10:45:13AM +0800, Jason Wang wrote: > > 在 2023/2/23 12:41, Heng Qi 写道: > > > > > > 在 2023/2/23 上午10:50, Jason Wang 写道: > > > Hi: > > > > > > 在 2023/2/22 14:46, Heng Qi 写道: > > > > Hi, Jason. Long time no see. :) > > > > > > > > 在 2023/2/22 上午11:22, Jason Wang 写道: > > > > > > > > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道: > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > > > > > +\subparagraph{Security risks between encapsulated packets and > > > > > > > RSS} > > > > > > > +There may be potential security risks when > > > > > > > encapsulated packets using RSS to > > > > > > > +select queues for placement. When a user inside a > > > > > > > tunnel tries to control the > > > > > > > > > > > > > > > What do you mean by "user" here? Is it a remote or local one? > > > > > > > > > > > > > I mean a remote attacker who is not under the control of the > > > > tunnel owner. > > > > > > > > > Anything may the tunnel different? I think this can happen even > > > without tunnel (and even with single queue). > > > > I agree. > > > > > > > > How to mitigate those attackers seems more like a implementation > > > details where might require fair queuing or other QOS technology > > > which has been well studied. > > > > I am also not sure whether this point needs to be focused on in the > > spec, and I see that the protection against tunnel DoS is more protected > > outside the device, > > but it seems to be okay to give some attack reminders. > > > Maybe it's sufficient to say the device should make sure the fairness among > different flows when queuing packets? > > Thanks that isn't really achievable. > > > > > Thanks. > > > > > > > > It seems out of the scope of the spec (unless we want to let driver > > > manageable QOS). > > > > > > Thanks > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > +enqueuing of encapsulated packets, then the user > > > > > > > can flood the device with invaild > > > > > > > +packets, and the flooded packets may be hashed into > > > > > > > the same queue as packets in > > > > > > > +other normal tunnels, which causing the queue to overflow. > > > > > > > + > > > > > > > +This can pose several security risks: > > > > > > > +\begin{itemize} > > > > > > > +\item Encapsulated packets in the normal tunnels > > > > > > > cannot be enqueued due to queue > > > > > > > + overflow, resulting in a large amount of packet loss. > > > > > > > +\item The delay and retransmission of packets in > > > > > > > the normal tunnels are extremely increased. > > > > > > > +\item The user can observe the traffic information > > > > > > > and enqueue information of other normal > > > > > > > + tunnels, and conduct targeted DoS attacks. > > > > > > > +\end{\itemize} > > > > > > > + > > > > > > Hmm with this all written out it sounds pretty severe. > > > > > > > > > > > > > > > I think we need first understand whether or not it's a > > > > > problem that we need to solve at spec level: > > > > > > > > > > 1) anything make encapsulated packets different or why we > > > > > can't hit this problem without encapsulation > > > > > > > > > > 2) whether or not it's the implementation details that the > > > > > spec doesn't need to care (or how it is solved in real NIC) > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > At this point with no ways to mitigate, I don't feel > > > > > > this is something > > > > > > e.g. Linux can enable. I am not going to nack the spec patch if > > > > > > others find this somehow useful e.g. for dpdk. > > > > > > How about CC e.g. dpdk devs or whoever else is going to use this > > > > > > and asking them for the opinion? > > > > > > > > > > > > > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/24 下午4:13, Michael S. Tsirkin 写道: On Thu, Feb 23, 2023 at 02:40:46PM +, Parav Pandit wrote: From: Michael S. Tsirkin Sent: Thursday, February 23, 2023 8:14 AM On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to indentify an RX queue. We can use the remaining bit as a "tunnel bit" to signal whether to use the inner or the outer hash for queue selection. I further brainstormed internally with Saeed and Rony on this. The inner hash is only needed for GRE, IPIP etc. For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source port of the outer header. It does that based on the inner header. Refer to [1] as one example. [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922 But I think hash was requested for RSS with dpdk, no? I think yes, at least probably the first customer to use the feature might be dpdk.:) The lookup will work like this then: calculate outer hash if (rss[outer hash] & tunnel bit) Tunnel bit, you mean tunneled packet, right? this idea stores a bit in the indirection table which signals which of the hashes to use for rss This allows inner hash to have the ability to select a queue and place packets to the queue (that is, parallel to RSS), which seems to be different from our discussion before v9. 🙁 Thanks. then calculate inner hash return rss[inner hash] & ~tunnel bit Why to end with a tunnel bit? this just clears the bit so we end up with a vq number. else return rss[outer hash] this fixes the security issue returning us back to status quo : specific tunnels can be directed to separate queues. The number of tunnels is far higher than the number of queues with para virt driver doing decap. True. This seeks to get us back to where we are before the feature: driver can send specific outer hashes to specific queues. outer hash collisions remain a problem. This is for RSS. For hash reporting indirection table is not used. Maybe it is enough to signal to driver that inner hash was used. We do need that signalling though. My question would be whether it's practical to implement in hardware. In above example, hw calculating double hash is difficult without much gain. Either calculating on one inner or outer makes sense. Signaling whether calculated on inner or outer is fine because hw exactly tells what it did. This, in a sense, is what reporting hash tunnel type did. Do you now think we need it? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Fri, Feb 24, 2023 at 10:38:37PM +0800, Heng Qi wrote: > > > 在 2023/2/24 下午4:13, Michael S. Tsirkin 写道: > > On Thu, Feb 23, 2023 at 02:40:46PM +, Parav Pandit wrote: > > > > > > > From: Michael S. Tsirkin > > > > Sent: Thursday, February 23, 2023 8:14 AM > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > > > > > So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up > > > > with > > > > an idea: RSS indirection table entries are 16 bit but onlu 15 bits are > > > > used to > > > > indentify an RX queue. > > > > We can use the remaining bit as a "tunnel bit" to signal whether to use > > > > the > > > > inner or the outer hash for queue selection. > > > > > > > I further brainstormed internally with Saeed and Rony on this. > > > > > > The inner hash is only needed for GRE, IPIP etc. > > > For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the > > > source port of the outer header. > > > It does that based on the inner header. > > > Refer to [1] as one example. > > > > > > [1] > > > https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922 > > But I think hash was requested for RSS with dpdk, no? > > I think yes, at least probably the first customer to use the feature might > be dpdk.:) > > > > > > > > The lookup will work like this then: > > > > > > > > calculate outer hash > > > > if (rss[outer hash] & tunnel bit) > > > Tunnel bit, you mean tunneled packet, right? > > this idea stores a bit in the indirection table > > which signals which of the hashes to use for rss > > This allows inner hash to have the ability to select a queue and place > packets to the queue (that is, parallel to RSS), > which seems to be different from our discussion before v9. 🙁 > > Thanks. Not exactly. The idea is that we start with outer hash. Based on that we use rss table to decide whether to use the inner hash. Given that Parav claims it's difficult to implement in hardware I'm not insisting this idea is included in the patchset. We can add it later. > > > > > > then > > > > calculate inner hash > > > > return rss[inner hash] & ~tunnel bit > > > Why to end with a tunnel bit? > > > > this just clears the bit so we end up with a vq number. > > > > > > else > > > > return rss[outer hash] > > > > > > > > > > > > this fixes the security issue returning us back to status quo : > > > > specific tunnels can > > > > be directed to separate queues. > > > > > > > The number of tunnels is far higher than the number of queues with para > > > virt driver doing decap. > > True. This seeks to get us back to where we are before the feature: > > driver can send specific outer hashes to specific queues. > > outer hash collisions remain a problem. > > > > > > > > This is for RSS. > > > > > > > > > > > > For hash reporting indirection table is not used. > > > > Maybe it is enough to signal to driver that inner hash was used. > > > > We do need that signalling though. > > > > > > > > My question would be whether it's practical to implement in hardware. > > > In above example, hw calculating double hash is difficult without much > > > gain. > > > Either calculating on one inner or outer makes sense. > > > > > > Signaling whether calculated on inner or outer is fine because hw exactly > > > tells what it did. > > This, in a sense, is what reporting hash tunnel type did. > > Do you now think we need it? > > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/28 下午4:52, Michael S. Tsirkin 写道: On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote: On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin wrote: On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote: On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin wrote: On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote: Btw, this kind of 1:1 hash features seems not scalable and flexible. It requires an endless extension on bits/fields. Modern NICs allow the user to customize the hash calculation, for virtio-net we can allow to use eBPF program to classify the packets. It seems to be more flexible and scalable and there's almost no maintain burden in the spec (only bytecode is required, no need any fancy features/interactions like maps), easy to be migrated etc. Prototype is also easy, tun/tap had an eBPF classifier for years. Thanks Yea BPF offload would be great to have. We have been discussing it for years though - security issues keep blocking it. *Maybe* it's finally going to be there but I'm not going to block this work waiting for BPF offload. And easily migrated is what BPF is not. Just to make sure we're at the same page. I meant to find a way to allow the driver/user to fully customize what it wants to hash/classify. Similar technologies which is based on private solution has been used by some vendors, which allow user to customize the classifier[1] ePBF looks like a good open-source solution candidate for this (there could be others). But there could be many kinds of eBPF programs that could be offloaded. One famous one is XDP which requires many features other than the bytecode/VM like map access, tailcall. Starting from such a complicated type is hard. Instead, we can start from a simple type, that is the eBPF classifier. All it needs is to pass the bytecode to the device, the device can choose to run it or compile it to what it can understand for classifying. We don't need maps, tail calls and other features. Until people start asking exactly for maps because they want state for their classifier? Yes, but let's compare the eBPF without maps with the static feature proposed here. It is much more scalable and flexible. And it makes sense - if you want e.g. load balancing you need stats which needs maps. Yes, but we know it's possible to have that (through the XDP offload). This is impossible with the approach proposed here. I'm not actually objecting. And at least we then don't need to worry about leaking info - it's not virtio leaking info it's the bpf program. I wonder what does Heng Qi think. Heng Qi would it work for your scenario? We are positive on ebpf, which looks adequate in our scenario. Although it currently has some problems in offloading, such as imperfect interfaces, unstable, and user-unfriendly ebpf codes may consume a lot of device resources. Device support for ebpf will also take time. Also, the presence of ebpf offload does not conflict with other solutions, eg we still have RSS. Our goal is to pass this patch first. For the support of ebpf offloading, we have not collected internal requirements for the time being, but it is indeed a good direction. Thanks. We don't need to worry about the security because of its simplicity: the eBPF program is only in charge of doing classification, no other interactions with the driver and packet modification is prohibited. The feature is limited only to the VM/bytecode abstraction itself. What's more, it's a good first step to achieve full eBPF offloading in the future. Thanks [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html Dave seems to have nacked this approach, no? I may miss something but looking at kernel commit, there are few patches to support that: E.g commit c7648810961682b9388be2dd041df06915647445 Author: Tony Nguyen Date: Mon Sep 9 06:47:44 2019 -0700 ice: Implement Dynamic Device Personalization (DDP) download And it has been used by DPDK drivers. Thanks If we are talking about netdev then this discussion has to take place on netdev. If it's dpdk this is more believable. -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. 1). During this process, the CPUs on the host is divided into two parts, one part is used as a forwarding node to parse the outer header, and the CPU utilization is low. Another part handles packets. 2). The entropy of the source udp src port is not enough, that is, the queue is not widely distributed. 2. When there is an inner header hash, the gateway will directly help parse the outer header, and use the inner 5 tuples to calculate the inner hash. The tunneled packet is then handed over to the host. 1) All the CPUs of the host are used to process data packets, and there is no need to use some CPUs to forward and parse the outer header. 2) The entropy of the original quintuple is sufficient, and the queue is widely distributed. Thanks. same goes for vxlan did not check further. so what is the problem? and which tunnel types actually suffer from the problem? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin wrote: > > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote: > > Kernel had already used the eBPF program for hashing, classifying > > various types of eBPF program other than XDP/socket filter > > (pass/drop). > > > > Thanks > > where is it used for hashing? I can see it is used by team/lb: static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv, struct sk_buff *skb) { struct bpf_prog *fp; uint32_t lhash; unsigned char *c; fp = rcu_dereference_bh(lb_priv->fp); if (unlikely(!fp)) return 0; lhash = bpf_prog_run(fp, skb); c = (char *) &lhash; return c[0] ^ c[1] ^ c[2] ^ c[3]; } But the point is that the return value is determined by the prog type (or the context). Thanks > > -- > MST > > > - > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote: > On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin wrote: > > > > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote: > > > Kernel had already used the eBPF program for hashing, classifying > > > various types of eBPF program other than XDP/socket filter > > > (pass/drop). > > > > > > Thanks > > > > where is it used for hashing? > > I can see it is used by team/lb: > > static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv, > struct sk_buff *skb) > { > struct bpf_prog *fp; > uint32_t lhash; > unsigned char *c; > > fp = rcu_dereference_bh(lb_priv->fp); > if (unlikely(!fp)) > return 0; > lhash = bpf_prog_run(fp, skb); > c = (char *) &lhash; > return c[0] ^ c[1] ^ c[2] ^ c[3]; > } > > But the point is that the return value is determined by the prog type > (or the context). > > Thanks OK so assuming we do this, how will users program this exactly? Given this is not standard, which tools will be used to attach such a program to the device? > > > > -- > > MST > > > > > > - > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Thu, Mar 2, 2023 at 4:41 PM Michael S. Tsirkin wrote: > > On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote: > > On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin wrote: > > > > > > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote: > > > > Kernel had already used the eBPF program for hashing, classifying > > > > various types of eBPF program other than XDP/socket filter > > > > (pass/drop). > > > > > > > > Thanks > > > > > > where is it used for hashing? > > > > I can see it is used by team/lb: > > > > static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv, > > struct sk_buff *skb) > > { > > struct bpf_prog *fp; > > uint32_t lhash; > > unsigned char *c; > > > > fp = rcu_dereference_bh(lb_priv->fp); > > if (unlikely(!fp)) > > return 0; > > lhash = bpf_prog_run(fp, skb); > > c = (char *) &lhash; > > return c[0] ^ c[1] ^ c[2] ^ c[3]; > > } > > > > But the point is that the return value is determined by the prog type > > (or the context). > > > > Thanks > > OK so assuming we do this, how will users program this exactly? For DPDK users, it could be integrated with the PMD. For kernel ueres, it probably requires a virtio specific netlink or char device. > Given this is not standard, which tools will be used to attach such > a program to the device? vDPA tool? Thanks > > > > > > > > -- > > > MST > > > > > > > > > - > > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > > > > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Thu, Mar 02, 2023 at 04:59:46PM +0800, Jason Wang wrote: > On Thu, Mar 2, 2023 at 4:41 PM Michael S. Tsirkin wrote: > > > > On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote: > > > On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin wrote: > > > > > > > > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote: > > > > > Kernel had already used the eBPF program for hashing, classifying > > > > > various types of eBPF program other than XDP/socket filter > > > > > (pass/drop). > > > > > > > > > > Thanks > > > > > > > > where is it used for hashing? > > > > > > I can see it is used by team/lb: > > > > > > static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv, > > > struct sk_buff *skb) > > > { > > > struct bpf_prog *fp; > > > uint32_t lhash; > > > unsigned char *c; > > > > > > fp = rcu_dereference_bh(lb_priv->fp); > > > if (unlikely(!fp)) > > > return 0; > > > lhash = bpf_prog_run(fp, skb); > > > c = (char *) &lhash; > > > return c[0] ^ c[1] ^ c[2] ^ c[3]; > > > } > > > > > > But the point is that the return value is determined by the prog type > > > (or the context). > > > > > > Thanks > > > > OK so assuming we do this, how will users program this exactly? > > For DPDK users, it could be integrated with the PMD. > For kernel ueres, it probably requires a virtio specific netlink or char > device. > > > Given this is not standard, which tools will be used to attach such > > a program to the device? > > vDPA tool? > > Thanks Ugh. I think I'd like ethtool to work. > > > > > > > > > > > > -- > > > > MST > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > > > > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org > > > > > > - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > If the tunnel is used to encapsulate the packets, the hash calculated > > > using the outer header of the receive packets is always fixed for the > > > same flow packets, i.e. they will be steered to the same receive queue. > > Wait a second. How is this true? Does not everyone stick the > > inner header hash in the outer source port to solve this? > > Yes, you are right. That's what we did before the inner header hash, but it > has a performance penalty, which I'll explain below. > > > For example geneve spec says: > > > > it is necessary for entropy from encapsulated packets to be > > exposed in the tunnel header. The most common technique for this is > > to use the UDP source port > > The end point of the tunnel called the gateway (with DPDK on top of it). > > 1. When there is no inner header hash, entropy can be inserted into the udp > src port of the outer header of the tunnel, > and then the tunnel packet is handed over to the host. The host needs to > take out a part of the CPUs to parse the outer headers (but not drop them) > to calculate the inner hash for the inner payloads, > and then use the inner > hash to forward them to another part of the CPUs that are responsible for > processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? > 1). During this process, the CPUs on the host is divided into two parts, one > part is used as a forwarding node to parse the outer header, > and the CPU utilization is low. Another part handles packets. Some overhead is clearly involved in *sending* packets - to calculate the hash and stick it in the port number. This is, however, a separate problem and if you want to solve it then my suggestion would be to teach the *transmit* side about GRE offloads, so it can fill the source port in the card. > 2). The entropy of the source udp src port is not enough, that is, the queue > is not widely distributed. how isn't it enough? 16 bit is enough to cover all vqs ... > 2. When there is an inner header hash, the gateway will directly help parse > the outer header, and use the inner 5 tuples to calculate the inner hash. > The tunneled packet is then handed over to the host. > 1) All the CPUs of the host are used to process data packets, and there is > no need to use some CPUs to forward and parse the outer header. You really have to parse the outer header anyway, otherwise there's no tunneling. Unless you want to teach virtio to implement tunneling in hardware, which is something I'd find it easier to get behind. > 2) The entropy of the original quintuple is sufficient, and the queue is > widely distributed. It's exactly the same entropy, why would it be better? In fact you are taking out the outer hash entropy making things worse. > > Thanks. > > > > same goes for vxlan did not check further. > > > > so what is the problem? and which tunnel types actually suffer from the > > problem? > > > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscr...@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org > List help: virtio-comment-h...@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/2/24 上午10:45, Jason Wang 写道: 在 2023/2/23 12:41, Heng Qi 写道: 在 2023/2/23 上午10:50, Jason Wang 写道: Hi: 在 2023/2/22 14:46, Heng Qi 写道: Hi, Jason. Long time no see. :) 在 2023/2/22 上午11:22, Jason Wang 写道: 在 2023/2/22 01:50, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: +\subparagraph{Security risks between encapsulated packets and RSS} +There may be potential security risks when encapsulated packets using RSS to +select queues for placement. When a user inside a tunnel tries to control the What do you mean by "user" here? Is it a remote or local one? I mean a remote attacker who is not under the control of the tunnel owner. Anything may the tunnel different? I think this can happen even without tunnel (and even with single queue). I agree. How to mitigate those attackers seems more like a implementation details where might require fair queuing or other QOS technology which has been well studied. I am also not sure whether this point needs to be focused on in the spec, and I see that the protection against tunnel DoS is more protected outside the device, but it seems to be okay to give some attack reminders. Maybe it's sufficient to say the device should make sure the fairness among different flows when queuing packets? Yes, maybe the device does not guarantee QoS or needs to guarantee enqueue fairness between flows. Thanks. Thanks Thanks. It seems out of the scope of the spec (unless we want to let driver manageable QOS). Thanks Thanks. +enqueuing of encapsulated packets, then the user can flood the device with invaild +packets, and the flooded packets may be hashed into the same queue as packets in +other normal tunnels, which causing the queue to overflow. + +This can pose several security risks: +\begin{itemize} +\item Encapsulated packets in the normal tunnels cannot be enqueued due to queue + overflow, resulting in a large amount of packet loss. +\item The delay and retransmission of packets in the normal tunnels are extremely increased. +\item The user can observe the traffic information and enqueue information of other normal + tunnels, and conduct targeted DoS attacks. +\end{\itemize} + Hmm with this all written out it sounds pretty severe. I think we need first understand whether or not it's a problem that we need to solve at spec level: 1) anything make encapsulated packets different or why we can't hit this problem without encapsulation 2) whether or not it's the implementation details that the spec doesn't need to care (or how it is solved in real NIC) Thanks At this point with no ways to mitigate, I don't feel this is something e.g. Linux can enable. I am not going to nack the spec patch if others find this somehow useful e.g. for dpdk. How about CC e.g. dpdk devs or whoever else is going to use this and asking them for the opinion? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscr...@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org List help: virtio-comment-h...@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? Assuming that the same flow includes a unidirectional flow a->b, or a bidirectional flow a->b and b->a, such flow may be out of order when processed by the gateway(DPDK): 1. In unidirectional mode, if the same flow is switched to another gateway for some reason, resulting in different outer IP address, then this flow may be processed by different CPUs after reaching the host if there is no inner hash. So after the host receives the flow, first use the forwarding CPUs to parse the inner hash, and then use the hash to ensure that the flow is processed by the same CPU. 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may go to gateway 2. In order to ensure that the same flow is processed by the same CPU, we still need the forwarding CPUs to parse the real inner hash(here, the hash key needs to be replaced with a symmetric hash key). 1). During this process, the CPUs on the host is divided into two parts, one part is used as a forwarding node to parse the outer header, and the CPU utilization is low. Another part handles packets. Some overhead is clearly involved in *sending* packets - to calculate the hash and stick it in the port number. This is, however, a separate problem and if you want to solve it then my suggestion would be to teach the *transmit* side about GRE offloads, so it can fill the source port in the card. 2). The entropy of the source udp src port is not enough, that is, the queue is not widely distributed. how isn't it enough? 16 bit is enough to cover all vqs ... A 5-tuple brings more entropy than a single port, doesn't it? In fact, the inner hash of the physical network card used by the business team is indeed better than the udp port number of the outer header we modify now, but they did not give me the data. 2. When there is an inner header hash, the gateway will directly help parse the outer header, and use the inner 5 tuples to calculate the inner hash. The tunneled packet is then handed over to the host. 1) All the CPUs of the host are used to process data packets, and there is no need to use some CPUs to forward and parse the outer header. You really have to parse the outer header anyway, otherwise there's no tunneling. Unless you want to teach virtio to implement tunneling in hardware, which is something I'd find it easier to get behind. There is no need to parse the outer header twice, because we use shared memory. 2) The entropy of the original quintuple is sufficient, and the queue is widely distributed. It's exactly the same entropy, why would it be better? In fact you are taking out the outer hash entropy making things worse. I don't get the point, why the entropy of the inner 5-tuple and the outer tunnel header is the same, multiple streams have the same outer header. Thanks. Thanks. same goes for vxlan did not check further. so what is the problem? and which tunnel types actually suffer from the problem? This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscr...@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org List help: virtio-comment-h...@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/w
Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > > > If the tunnel is used to encapsulate the packets, the hash calculated > > > > > using the outer header of the receive packets is always fixed for the > > > > > same flow packets, i.e. they will be steered to the same receive > > > > > queue. > > > > Wait a second. How is this true? Does not everyone stick the > > > > inner header hash in the outer source port to solve this? > > > Yes, you are right. That's what we did before the inner header hash, but > > > it > > > has a performance penalty, which I'll explain below. > > > > > > > For example geneve spec says: > > > > > > > > it is necessary for entropy from encapsulated packets to be > > > > exposed in the tunnel header. The most common technique for this > > > > is > > > > to use the UDP source port > > > The end point of the tunnel called the gateway (with DPDK on top of it). > > > > > > 1. When there is no inner header hash, entropy can be inserted into the > > > udp > > > src port of the outer header of the tunnel, > > > and then the tunnel packet is handed over to the host. The host needs to > > > take out a part of the CPUs to parse the outer headers (but not drop them) > > > to calculate the inner hash for the inner payloads, > > > and then use the inner > > > hash to forward them to another part of the CPUs that are responsible for > > > processing. > > I don't get this part. Leave inner hashes to the guest inside the > > tunnel, why is your host doing this? > > Assuming that the same flow includes a unidirectional flow a->b, or a > bidirectional flow a->b and b->a, > such flow may be out of order when processed by the gateway(DPDK): > > 1. In unidirectional mode, if the same flow is switched to another gateway > for some reason, resulting in different outer IP address, > then this flow may be processed by different CPUs after reaching the > host if there is no inner hash. So after the host receives the > flow, first use the forwarding CPUs to parse the inner hash, and then > use the hash to ensure that the flow is processed by the > same CPU. > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may > go to gateway 2. In order to ensure that the same flow is > processed by the same CPU, we still need the forwarding CPUs to parse > the real inner hash(here, the hash key needs to be replaced with a symmetric > hash key). Oh intersting. What are those gateways, how come there's expectation that you can change their addresses and topology completely seamlessly without any reordering whatsoever? Isn't network topology change kind of guaranteed to change ordering sometimes? > > > > > 1). During this process, the CPUs on the host is divided into two parts, > > > one > > > part is used as a forwarding node to parse the outer header, > > > and the CPU utilization is low. Another part handles packets. > > Some overhead is clearly involved in *sending* packets - > > to calculate the hash and stick it in the port number. > > This is, however, a separate problem and if you want to > > solve it then my suggestion would be to teach the *transmit* > > side about GRE offloads, so it can fill the source port in the card. > > > > > 2). The entropy of the source udp src port is not enough, that is, the > > > queue > > > is not widely distributed. > > how isn't it enough? 16 bit is enough to cover all vqs ... > > A 5-tuple brings more entropy than a single port, doesn't it? But you don't need more for RSS, the indirection table is not that large. > In fact, the > inner hash of the physical network card used by > the business team is indeed better than the udp port number of the outer > header we modify now, but they did not give me the data. Admittedly, out hash value is 32 bit. > > > 2. When there is an inner header hash, the gateway will directly help > > > parse > > > the outer header, and use the inner 5 tuples to calculate the inner hash. > > > The tunneled packet is then handed over to the host. > > > 1) All the CPUs of the host are used to process data packets, and there is > > > no need to use some CPUs to forward and parse the outer header. > > You really have to parse the outer header anyway, > > otherwise there's no tunneling. > > Unless you want to teach virtio to implement tunneling > > in hardware, which is something I'd find it easier to > > get behind. > > There is no need to parse the outer header twice, because we use shared > memory. shared with what? you need the outer header to identify the tunnel. > > > 2) The entropy of the original quintuple is sufficient, and the queue is > > > widely distributed. > > It's exactly the same entropy, why would it be better? In fact you
Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? Let's simplify some details and take a fresh look at two different scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). 1. In Scenario1, we can improve the processing performance of the same flow by implementing inner symmetric hashing. This is because even though client1 and client2 communicate bidirectionally through the same flow, their data may pass through and be encapsulated by different tunnels, resulting in the same flow being hashed to different queues and processed by different CPUs. To ensure consistency and optimized processing, we need to parse out the inner header and compute a symmetric hash on it using a special rss key. Sorry for not mentioning the inner symmetric hash before, in order to prevent the introduction of more concepts, but it is indeed a kind of inner hash. 2. In Scenario2 with GRE, the lack of outer transport headers means that flows between multiple communication pairs encapsulated by the same tunnel will all be hashed to the same queue. To address this, we need to implement inner hashing to improve the performance of RSS. By parsing and calculating the inner hash, different flows can be hashed to different queues. Thanks. Assuming that the same flow includes a unidirectional flow a->b, or a bidirectional flow a->b and b->a, such flow may be out of order when processed by the gateway(DPDK): 1. In unidirectional mode, if the same flow is switched to another gateway for some reason, resulting in different outer IP address, then this flow may be processed by different CPUs after reaching the host if there is no inner hash. So after the host receives the flow, first use the forwarding CPUs to parse the inner hash, and then use the hash to ensure that the flow is processed by the same CPU. 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may go to gateway 2. In order to ensure that the same flow is processed by the same CPU, we still need the forwarding CPUs to parse the real inner hash(here, the hash key needs to be replaced with a symmetric hash key). Oh intersting. What are those gateways, how come there's expectation that you can change their addresses and topology completely seamlessly without any reordering whatsoever? Isn't network topology change kind of guaranteed to change ordering sometimes? 1). During this process, the CPUs on the host is divided into two parts, one part is used as a forwarding node to parse the outer header, and the CPU utilization is low. Another part handles packets. Some overhead is clearly involved in *sending* packets - to calculate the hash and stick it in the port number. This is, however, a separate problem and if you want to solve it then my suggestion would be to teach the *transmit* side about GRE offloads, so it can fill the source port in the card. 2). The entropy of the source udp src port is not enough, that is, the queue is not widely distributed. how isn't it enough? 16 bit is enough to cover all vqs ... A 5-tuple brings more entropy than a single port, doesn't it? But you don't need more for RSS, the indirection table is not that large. In fact, the inner hash of the physical network card used by the business team is indeed better than the udp port number of the outer header we modify now, but they did not give me the data. Admittedly, out hash value is 32 bit. 2. When t
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: > > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: > > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > > > > > If the tunnel is used to encapsulate the packets, the hash > > > > > > > calculated > > > > > > > using the outer header of the receive packets is always fixed for > > > > > > > the > > > > > > > same flow packets, i.e. they will be steered to the same receive > > > > > > > queue. > > > > > > Wait a second. How is this true? Does not everyone stick the > > > > > > inner header hash in the outer source port to solve this? > > > > > Yes, you are right. That's what we did before the inner header hash, > > > > > but it > > > > > has a performance penalty, which I'll explain below. > > > > > > > > > > > For example geneve spec says: > > > > > > > > > > > > it is necessary for entropy from encapsulated packets to be > > > > > > exposed in the tunnel header. The most common technique for > > > > > > this is > > > > > > to use the UDP source port > > > > > The end point of the tunnel called the gateway (with DPDK on top of > > > > > it). > > > > > > > > > > 1. When there is no inner header hash, entropy can be inserted into > > > > > the udp > > > > > src port of the outer header of the tunnel, > > > > > and then the tunnel packet is handed over to the host. The host needs > > > > > to > > > > > take out a part of the CPUs to parse the outer headers (but not drop > > > > > them) > > > > > to calculate the inner hash for the inner payloads, > > > > > and then use the inner > > > > > hash to forward them to another part of the CPUs that are responsible > > > > > for > > > > > processing. > > > > I don't get this part. Leave inner hashes to the guest inside the > > > > tunnel, why is your host doing this? > > > Let's simplify some details and take a fresh look at two different > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). > > 1. In Scenario1, we can improve the processing performance of the same flow > by implementing inner symmetric hashing. > > This is because even though client1 and client2 communicate bidirectionally > through the same flow, their data may pass > > through and be encapsulated by different tunnels, resulting in the same flow > being hashed to different queues and processed by different CPUs. > > To ensure consistency and optimized processing, we need to parse out the > inner header and compute a symmetric hash on it using a special rss key. > > Sorry for not mentioning the inner symmetric hash before, in order to > prevent the introduction of more concepts, but it is indeed a kind of inner > hash. If parts of a flow go through different tunnels won't this cause reordering at the network level? Why is it so important to prevent it at the nic then? Or, since you are stressing symmetric hash, are you talking about TX and RX side going through different tunnels? > 2. In Scenario2 with GRE, the lack of outer transport headers means that > flows between multiple communication pairs encapsulated by the same tunnel > > will all be hashed to the same queue. To address this, we need to implement > inner hashing to improve the performance of RSS. By parsing and calculating > > the inner hash, different flows can be hashed to different queues. > > Thanks. > > Well 2 is at least inexact, there's flowID there. It's just 8 bit so not sufficient if there are more than 512 queues. Still 512 queues is quite a lot. Are you trying to solve for configurations with more than 512 queues then? > > > Assuming that the same flow includes a unidirectional flow a->b, or a > > > bidirectional flow a->b and b->a, > > > such flow may be out of order when processed by the gateway(DPDK): > > > > > > 1. In unidirectional mode, if the same flow is switched to another gateway > > > for some reason, resulting in different outer IP address, > > > then this flow may be processed by different CPUs after reaching the > > > host if there is no inner hash. So after the host receives the > > > flow, first use the forwarding CPUs to parse the inner hash, and then > > > use the hash to ensure that the flow is processed by the > > > same CPU. > > > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may > > > go to gateway 2. In order to ensure that the same flow is > > > processed by the same CPU, we still need the forwarding CPUs to parse > > > the real inner hash(here, the hash key needs to be replaced with a > > > symmetric > > > hash key). > > Oh intersting. What are those gateways, how come there's expectation > > that you can change their addresses and topology > > completely seamlessly without any reordering
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/15 下午7:58, Michael S. Tsirkin 写道: On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? Let's simplify some details and take a fresh look at two different scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). 1. In Scenario1, we can improve the processing performance of the same flow by implementing inner symmetric hashing. This is because even though client1 and client2 communicate bidirectionally through the same flow, their data may pass through and be encapsulated by different tunnels, resulting in the same flow being hashed to different queues and processed by different CPUs. To ensure consistency and optimized processing, we need to parse out the inner header and compute a symmetric hash on it using a special rss key. Sorry for not mentioning the inner symmetric hash before, in order to prevent the introduction of more concepts, but it is indeed a kind of inner hash. If parts of a flow go through different tunnels won't this cause reordering at the network level? Why is it so important to prevent it at the nic then? Or, since you are stressing symmetric hash, are you talking about TX and RX side going through different tunnels? Yes, the directions client1->client2 and client2->client1 may go through different tunnels. Using inner symmetric hashing can satisfy the same CPU to process two directions of the same flow to improve performance. 2. In Scenario2 with GRE, the lack of outer transport headers means that flows between multiple communication pairs encapsulated by the same tunnel will all be hashed to the same queue. To address this, we need to implement inner hashing to improve the performance of RSS. By parsing and calculating the inner hash, different flows can be hashed to different queues. Thanks. Well 2 is at least inexact, there's flowID there. It's just 8 bit We use the most basic GRE header fields (not NVGRE), not even optional fields. There is also no flow id in the GRE header, should you be referring to NVGRE? Thanks. so not sufficient if there are more than 512 queues. Still 512 queues is quite a lot. Are you trying to solve for configurations with more than 512 queues then? Assuming that the same flow includes a unidirectional flow a->b, or a bidirectional flow a->b and b->a, such flow may be out of order when processed by the gateway(DPDK): 1. In unidirectional mode, if the same flow is switched to another gateway for some reason, resulting in different outer IP address, then this flow may be processed by different CPUs after reaching the host if there is no inner hash. So after the host receives the flow, first use the forwarding CPUs to parse the inner hash, and then use the hash to ensure that the flow is processed by the same CPU. 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may go to gateway 2. In order to ensure that the same flow is processed by the same CPU, we still need the forwarding CPUs to parse the real inner hash(here, the hash key needs to be replaced with a symmetric hash key). Oh intersting. What are those gateways, how come there's expectation that you can change their addresses and topology completely seamlessly without any reordering whatsoever? Isn't network topology change kind of guaranteed to change ordering sometimes? 1). During this process, the CPUs on the host is divided into two part
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: > > > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道: > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: > > > > > > > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash > > > > > > > > > calculated > > > > > > > > > using the outer header of the receive packets is always fixed > > > > > > > > > for the > > > > > > > > > same flow packets, i.e. they will be steered to the same > > > > > > > > > receive queue. > > > > > > > > Wait a second. How is this true? Does not everyone stick the > > > > > > > > inner header hash in the outer source port to solve this? > > > > > > > Yes, you are right. That's what we did before the inner header > > > > > > > hash, but it > > > > > > > has a performance penalty, which I'll explain below. > > > > > > > > > > > > > > > For example geneve spec says: > > > > > > > > > > > > > > > >it is necessary for entropy from encapsulated packets to > > > > > > > > be > > > > > > > >exposed in the tunnel header. The most common technique > > > > > > > > for this is > > > > > > > >to use the UDP source port > > > > > > > The end point of the tunnel called the gateway (with DPDK on top > > > > > > > of it). > > > > > > > > > > > > > > 1. When there is no inner header hash, entropy can be inserted > > > > > > > into the udp > > > > > > > src port of the outer header of the tunnel, > > > > > > > and then the tunnel packet is handed over to the host. The host > > > > > > > needs to > > > > > > > take out a part of the CPUs to parse the outer headers (but not > > > > > > > drop them) > > > > > > > to calculate the inner hash for the inner payloads, > > > > > > > and then use the inner > > > > > > > hash to forward them to another part of the CPUs that are > > > > > > > responsible for > > > > > > > processing. > > > > > > I don't get this part. Leave inner hashes to the guest inside the > > > > > > tunnel, why is your host doing this? > > > > > > Let's simplify some details and take a fresh look at two different > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). > > > > > > 1. In Scenario1, we can improve the processing performance of the same > > > flow > > > by implementing inner symmetric hashing. > > > > > > This is because even though client1 and client2 communicate > > > bidirectionally > > > through the same flow, their data may pass > > > > > > through and be encapsulated by different tunnels, resulting in the same > > > flow > > > being hashed to different queues and processed by different CPUs. > > > > > > To ensure consistency and optimized processing, we need to parse out the > > > inner header and compute a symmetric hash on it using a special rss key. > > > > > > Sorry for not mentioning the inner symmetric hash before, in order to > > > prevent the introduction of more concepts, but it is indeed a kind of > > > inner > > > hash. > > If parts of a flow go through different tunnels won't this cause > > reordering at the network level? Why is it so important to prevent it at > > the nic then? Or, since you are stressing symmetric hash, are you > > talking about TX and RX side going through different tunnels? > > Yes, the directions client1->client2 and client2->client1 may go through > different tunnels. > Using inner symmetric hashing can satisfy the same CPU to process two > directions of the same flow to improve performance. Well sure but ... are you just doing forwarding or inner processing too? If forwarding why do you care about matching TX and RX queues? If e2e processing can't you just store the incoming hash in the flow and reuse on TX? This is what Linux is doing... > > > > > > > 2. In Scenario2 with GRE, the lack of outer transport headers means that > > > flows between multiple communication pairs encapsulated by the same tunnel > > > > > > will all be hashed to the same queue. To address this, we need to > > > implement > > > inner hashing to improve the performance of RSS. By parsing and > > > calculating > > > > > > the inner hash, different flows can be hashed to different queues. > > > > > > Thanks. > > > > > > > > Well 2 is at least inexact, there's flowID there. It's just 8 bit > > We use the most basic GRE header fields (not NVGRE), not even optional > fields. > There is also no flow id in the GRE header, should you be referring to > NVGRE? > > Thanks. > > > so not sufficient if there are more than 512 queues. Still 512 queues > > is quite a lot. Are you trying to solve for configurations with > > more than 512 queues then? > > >
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote: > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: > > > > > > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道: > > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: > > > > > > > > > > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: > > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: > > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: > > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: > > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: > > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash > > > > > > > > > > calculated > > > > > > > > > > using the outer header of the receive packets is always > > > > > > > > > > fixed for the > > > > > > > > > > same flow packets, i.e. they will be steered to the same > > > > > > > > > > receive queue. > > > > > > > > > Wait a second. How is this true? Does not everyone stick the > > > > > > > > > inner header hash in the outer source port to solve this? > > > > > > > > Yes, you are right. That's what we did before the inner header > > > > > > > > hash, but it > > > > > > > > has a performance penalty, which I'll explain below. > > > > > > > > > > > > > > > > > For example geneve spec says: > > > > > > > > > > > > > > > > > >it is necessary for entropy from encapsulated packets > > > > > > > > > to be > > > > > > > > >exposed in the tunnel header. The most common > > > > > > > > > technique for this is > > > > > > > > >to use the UDP source port > > > > > > > > The end point of the tunnel called the gateway (with DPDK on > > > > > > > > top of it). > > > > > > > > > > > > > > > > 1. When there is no inner header hash, entropy can be inserted > > > > > > > > into the udp > > > > > > > > src port of the outer header of the tunnel, > > > > > > > > and then the tunnel packet is handed over to the host. The host > > > > > > > > needs to > > > > > > > > take out a part of the CPUs to parse the outer headers (but not > > > > > > > > drop them) > > > > > > > > to calculate the inner hash for the inner payloads, > > > > > > > > and then use the inner > > > > > > > > hash to forward them to another part of the CPUs that are > > > > > > > > responsible for > > > > > > > > processing. > > > > > > > I don't get this part. Leave inner hashes to the guest inside the > > > > > > > tunnel, why is your host doing this? > > > > > > > > Let's simplify some details and take a fresh look at two different > > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). > > > > > > > > 1. In Scenario1, we can improve the processing performance of the same > > > > flow > > > > by implementing inner symmetric hashing. > > > > > > > > This is because even though client1 and client2 communicate > > > > bidirectionally > > > > through the same flow, their data may pass > > > > > > > > through and be encapsulated by different tunnels, resulting in the same > > > > flow > > > > being hashed to different queues and processed by different CPUs. > > > > > > > > To ensure consistency and optimized processing, we need to parse out the > > > > inner header and compute a symmetric hash on it using a special rss key. > > > > > > > > Sorry for not mentioning the inner symmetric hash before, in order to > > > > prevent the introduction of more concepts, but it is indeed a kind of > > > > inner > > > > hash. > > > If parts of a flow go through different tunnels won't this cause > > > reordering at the network level? Why is it so important to prevent it at > > > the nic then? Or, since you are stressing symmetric hash, are you > > > talking about TX and RX side going through different tunnels? > > > > Yes, the directions client1->client2 and client2->client1 may go through > > different tunnels. > > Using inner symmetric hashing can satisfy the same CPU to process two > > directions of the same flow to improve performance. > > Well sure but ... are you just doing forwarding or inner processing too? When there is an inner hash, there is no forwarding anymore. > If forwarding why do you care about matching TX and RX queues? If e2e In fact, we are just matching on the same rx queue. The network topology is roughly as follows. The processing host will receive the packets sent from client1 and client2 respectively, then make some action judgments, and return them to client2 and client1 respectively. client1 client2 | | | __ | +->| tunnel |<+ || | | | | | | v v +-+ | processing host | +-+ Thanks. > processing can't you just store the incoming hash in the flow and reuse > on TX? This is what Linux is doing... > > > > > > > >
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote: > On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote: > > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: > > > > > > > > > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道: > > > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: > > > > > > > > > > > > > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: > > > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: > > > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: > > > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: > > > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: > > > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: > > > > > > > > > > > If the tunnel is used to encapsulate the packets, the > > > > > > > > > > > hash calculated > > > > > > > > > > > using the outer header of the receive packets is always > > > > > > > > > > > fixed for the > > > > > > > > > > > same flow packets, i.e. they will be steered to the same > > > > > > > > > > > receive queue. > > > > > > > > > > Wait a second. How is this true? Does not everyone stick the > > > > > > > > > > inner header hash in the outer source port to solve this? > > > > > > > > > Yes, you are right. That's what we did before the inner > > > > > > > > > header hash, but it > > > > > > > > > has a performance penalty, which I'll explain below. > > > > > > > > > > > > > > > > > > > For example geneve spec says: > > > > > > > > > > > > > > > > > > > >it is necessary for entropy from encapsulated > > > > > > > > > > packets to be > > > > > > > > > >exposed in the tunnel header. The most common > > > > > > > > > > technique for this is > > > > > > > > > >to use the UDP source port > > > > > > > > > The end point of the tunnel called the gateway (with DPDK on > > > > > > > > > top of it). > > > > > > > > > > > > > > > > > > 1. When there is no inner header hash, entropy can be > > > > > > > > > inserted into the udp > > > > > > > > > src port of the outer header of the tunnel, > > > > > > > > > and then the tunnel packet is handed over to the host. The > > > > > > > > > host needs to > > > > > > > > > take out a part of the CPUs to parse the outer headers (but > > > > > > > > > not drop them) > > > > > > > > > to calculate the inner hash for the inner payloads, > > > > > > > > > and then use the inner > > > > > > > > > hash to forward them to another part of the CPUs that are > > > > > > > > > responsible for > > > > > > > > > processing. > > > > > > > > I don't get this part. Leave inner hashes to the guest inside > > > > > > > > the > > > > > > > > tunnel, why is your host doing this? > > > > > > > > > > Let's simplify some details and take a fresh look at two different > > > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). > > > > > > > > > > 1. In Scenario1, we can improve the processing performance of the > > > > > same flow > > > > > by implementing inner symmetric hashing. > > > > > > > > > > This is because even though client1 and client2 communicate > > > > > bidirectionally > > > > > through the same flow, their data may pass > > > > > > > > > > through and be encapsulated by different tunnels, resulting in the > > > > > same flow > > > > > being hashed to different queues and processed by different CPUs. > > > > > > > > > > To ensure consistency and optimized processing, we need to parse out > > > > > the > > > > > inner header and compute a symmetric hash on it using a special rss > > > > > key. > > > > > > > > > > Sorry for not mentioning the inner symmetric hash before, in order to > > > > > prevent the introduction of more concepts, but it is indeed a kind of > > > > > inner > > > > > hash. > > > > If parts of a flow go through different tunnels won't this cause > > > > reordering at the network level? Why is it so important to prevent it at > > > > the nic then? Or, since you are stressing symmetric hash, are you > > > > talking about TX and RX side going through different tunnels? > > > > > > Yes, the directions client1->client2 and client2->client1 may go through > > > different tunnels. > > > Using inner symmetric hashing can satisfy the same CPU to process two > > > directions of the same flow to improve performance. > > > > Well sure but ... are you just doing forwarding or inner processing too? > > When there is an inner hash, there is no forwarding anymore. > > > If forwarding why do you care about matching TX and RX queues? If e2e > > In fact, we are just matching on the same rx queue. The network topology > is roughly as follows. The processing host will receive the packets > sent from client1 and client2 respectively, then make some action judgments, > and return them to client2 and client1 respectively. > > client1 client2 >| | >| __ | >+->| tunnel |<+ > || >
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: > We use the most basic GRE header fields (not NVGRE), not even optional > fields. I'd say yes, the most convincing usecase is with legacy GRE. Given that, do you need the rest of protocols there? We can start with just legacy GRE (think about including IPv6 or not). Given how narrow this usecase is, I'd be fine with focusing just on this, and addressing more protocols down the road with something programmable like BPF. WDYT? -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/21 上午3:45, Michael S. Tsirkin 写道: On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote: On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote: On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道: On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote: 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道: On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote: 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道: On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote: 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道: On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote: If the tunnel is used to encapsulate the packets, the hash calculated using the outer header of the receive packets is always fixed for the same flow packets, i.e. they will be steered to the same receive queue. Wait a second. How is this true? Does not everyone stick the inner header hash in the outer source port to solve this? Yes, you are right. That's what we did before the inner header hash, but it has a performance penalty, which I'll explain below. For example geneve spec says: it is necessary for entropy from encapsulated packets to be exposed in the tunnel header. The most common technique for this is to use the UDP source port The end point of the tunnel called the gateway (with DPDK on top of it). 1. When there is no inner header hash, entropy can be inserted into the udp src port of the outer header of the tunnel, and then the tunnel packet is handed over to the host. The host needs to take out a part of the CPUs to parse the outer headers (but not drop them) to calculate the inner hash for the inner payloads, and then use the inner hash to forward them to another part of the CPUs that are responsible for processing. I don't get this part. Leave inner hashes to the guest inside the tunnel, why is your host doing this? Let's simplify some details and take a fresh look at two different scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2). 1. In Scenario1, we can improve the processing performance of the same flow by implementing inner symmetric hashing. This is because even though client1 and client2 communicate bidirectionally through the same flow, their data may pass through and be encapsulated by different tunnels, resulting in the same flow being hashed to different queues and processed by different CPUs. To ensure consistency and optimized processing, we need to parse out the inner header and compute a symmetric hash on it using a special rss key. Sorry for not mentioning the inner symmetric hash before, in order to prevent the introduction of more concepts, but it is indeed a kind of inner hash. If parts of a flow go through different tunnels won't this cause reordering at the network level? Why is it so important to prevent it at the nic then? Or, since you are stressing symmetric hash, are you talking about TX and RX side going through different tunnels? Yes, the directions client1->client2 and client2->client1 may go through different tunnels. Using inner symmetric hashing can satisfy the same CPU to process two directions of the same flow to improve performance. Well sure but ... are you just doing forwarding or inner processing too? When there is an inner hash, there is no forwarding anymore. If forwarding why do you care about matching TX and RX queues? If e2e In fact, we are just matching on the same rx queue. The network topology is roughly as follows. The processing host will receive the packets sent from client1 and client2 respectively, then make some action judgments, and return them to client2 and client1 respectively. client1 client2 | | | __ | +->| tunnel |<+ || | | | | | | v v +-+ | processing host | +-+ Thanks. monotoring host would be a better term Sure. I'm so sorry I didn't realize I missed this until I checked my emails. 😮 :( processing can't you just store the incoming hash in the flow and reuse on TX? This is what Linux is doing... 2. In Scenario2 with GRE, the lack of outer transport headers means that flows between multiple communication pairs encapsulated by the same tunnel will all be hashed to the same queue. To address this, we need to implement inner hashing to improve the performance of RSS. By parsing and calculating the inner hash, different flows can be hashed to different queues. Thanks. Well 2 is at least inexact, there's flowID there. It's just 8 bit We use the most basic GRE header fields (not NVGRE), not even optional fields. There is also no flow id in the GRE header, should you be referring to NVGRE? Thanks. so not sufficient if there are more than 512 queues. Still 512 queues is quite a lot. Are you trying to solve for c
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/3/21 上午3:48, Michael S. Tsirkin 写道: On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: We use the most basic GRE header fields (not NVGRE), not even optional fields. I'd say yes, the most convincing usecase is with legacy GRE. Yes. But we still have a strong need for VXLAN and GENEVE to do symmetric hashing. Please consider this. Given that, do you need the rest of protocols there? I would say that I checked the current tunneling protocols used for overlay networks and their respective RFC versions compared to each other. They are: 1. GRE_rfc2784 :This protocol is only specified for IPv4 and used as either the payload or delivery protocol. link : https://datatracker.ietf.org/doc/rfc2784/ 2. GRE_rfc2890: This protocol describes extensions by which two fields, Key and Sequence Number, can be optionally carried in the GRE Header. link: https://www.rfc-editor.org/rfc/rfc2890 3. GRE_rfc7676: IPv6 Support for Generic Routing Encapsulation (GRE). This protocol is specified for IPv6 and used as either the payload or delivery protocol. Note that this does not change the GRE header format or any behaviors specified by RFC 2784 or RFC 2890. link: https://datatracker.ietf.org/doc/rfc7676/ 4. GRE-in-UDP: GRE-in-UDP Encapsulation. This specifies a method of encapsulating network protocol packets within GRE and UDP headers. This GRE-in-UDP encapsulation allows the UDP source port field to be used as an entropy field. This protocol is specified for IPv4 and IPv6, and used as either the payload or delivery protocol. link: https://www.rfc-editor.org/rfc/rfc8086 5. VXLAN: Virtual eXtensible Local Area Network. link: https://datatracker.ietf.org/doc/rfc7348/ 6. VXLAN-GPE: Generic Protocol Extension for VXLAN. This protocol describes extending Virtual eXtensible Local Area Network (VXLAN) via changes to the VXLAN header. link: https://www.ietf.org/archive/id/draft-ietf-nvo3-vxlan-gpe-12.txt 7. GENEVE: Generic Network Virtualization Encapsulation. link: https://datatracker.ietf.org/doc/rfc8926/ 8. IPIP: IP Encapsulation within IP. link: https://www.rfc-editor.org/rfc/rfc2003 9. NVGRE: Network Virtualization Using Generic Routing Encapsulation link: https://www.rfc-editor.org/rfc/rfc7637.html 10. STT: Stateless Transport Tunneling. STT is particularly useful when some tunnel endpoints are in end-systems, as it utilizes the capabilities of the network interface card to improve performance. link: https://www.ietf.org/archive/id/draft-davie-stt-08.txt Among them, GRE_rfc2784, VXLAN and GENEVE are our internal requirements for inner header hashing. GRE_rfc2784 requires RSS hashing to different queues. For the monitoring scenario I mentioned, VXLAN or GRE_rfc2890 also needs to use inner symmetric hashing. I know you mean to want this feature to only support GRE_rfc2784, since it's the most convincing for RSS. But RSS hashes packets to different queues for different streams. For the same flow, it needs to hash it to the same queue. So this doesn't distort the role of RSS, and I believe that for modern protocols like VXLAN and others, inner symmetric hashing is still a common requirement for other vendors using virtio devices. So, can we make this feature support all the protocols I have checked above, so that vendors can choose to support the protocols they want. And this can avoid the addition of new tunnel protocols in the near future as much as possible. Do you think it's ok? Again: I'm so sorry I didn't realize I missed this until I checked my emails. 🙁😮 We can start with just legacy GRE (think about including IPv6 or not). Given how narrow this usecase is, I'd be fine with focusing just on this, and addressing more protocols down the road with something programmable like BPF. WDYT? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
On Thu, Mar 30, 2023 at 08:37:21PM +0800, Heng Qi wrote: > > > 在 2023/3/21 上午3:48, Michael S. Tsirkin 写道: > > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: > > > We use the most basic GRE header fields (not NVGRE), not even optional > > > fields. > > I'd say yes, the most convincing usecase is with legacy GRE. > > Yes. But we still have a strong need for VXLAN and GENEVE to do symmetric > hashing. Please consider this. Using a specific key seems fragile though in that a different one is needed for e.g. ipv4 and ipv6. An issue with VXLAN and GENEVE, yes? Will support for XOR hashing address this sufficiently or is that not acceptable to you? Or alternatively a modified Toeplitz, e.g. this https://inbox.dpdk.org/dev/20190731123040.gg4...@6wind.com/ suggests Mellanox supports that. WDYT? -- MST - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
在 2023/4/8 下午6:29, Michael S. Tsirkin 写道: On Thu, Mar 30, 2023 at 08:37:21PM +0800, Heng Qi wrote: 在 2023/3/21 上午3:48, Michael S. Tsirkin 写道: On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote: We use the most basic GRE header fields (not NVGRE), not even optional fields. I'd say yes, the most convincing usecase is with legacy GRE. Yes. But we still have a strong need for VXLAN and GENEVE to do symmetric hashing. Please consider this. Using a specific key seems fragile though in that a different one is needed for e.g. ipv4 and ipv6. An issue with VXLAN and GENEVE, yes? Yes. Will support for XOR hashing address this sufficiently or is that not acceptable to you? Or alternatively a modified Toeplitz, e.g. this This is a very good opinion, I will want to follow up on this work and I have expressed in other threads. Thanks. https://inbox.dpdk.org/dev/20190731123040.gg4...@6wind.com/ suggests Mellanox supports that. WDYT? - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org