-邮件原件-
发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
发送时间: 2020年10月27日 21:12
收件人: yang_y...@163.com; ovs-dev@openvswitch.org
抄送: f...@sysclose.org; i.maxim...@ovn.org
主题: Re: [ovs-dev] [PATCH V3 3/4] Add VXLAN TCP and UDP GRO support for DPDK
data path
On 8/7/20 12:56 PM, yang_y...@163.com wrote:
> From: Yi Yang
>
> GRO(Generic Receive Offload) can help improve performance when TSO
> (TCP Segment Offload) or VXLAN TSO is enabled on transmit side, this
> can avoid overhead of ovs DPDK data path and enqueue vhost for VM by
> merging many small packets to large packets (65535 bytes at most) once
> it receives packets from physical NIC.
IIUC, this patch allows multi-segment mbufs to float across different parts of
OVS. This will definitely crash it somewhere. Much more changes all over the
OVS required to make it safely work with such mbufs. There were few attempts
to introduce this support, but all of them ended up being rejected. As it is
this patch is not acceptable as it doesn't cover almost anything beside simple
cases inside netdev implementation.
Here is the latest attempt with multi-segment mbufs:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=130193&state=*
Best regards, Ilya Maximets.
[Yi Yang] We have to support this because we have supported TSO for TCP, it
can't handle big UDP, this is why we must introduce GSO, the prerequisite of
GSO is multi-segment must be enabled because GSOed mbufs are multi-segmented,
but it is just last step before dpdk Tx, so I don't think it is an issue, per
my test in our openstack environment, I didn't encounter any crash, this just
enabled DPDK PMD driver to handle GSOed mbuf. For GRO, reassembling also use
chained multi-segment mbuf to avoid copy, per long time test, it also didn't
lead to any crash. We can fix some corner cases if they aren't covered.
>
> It can work for both VXLAN and vlan case.
>
> Signed-off-by: Yi Yang
> ---
> lib/dp-packet.h| 37 -
> lib/netdev-dpdk.c | 227
> -
> lib/netdev-linux.c | 112 --
> 3 files changed, 365 insertions(+), 11 deletions(-)
>
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h index c33868d..18307c0
> 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -580,7 +580,16 @@ dp_packet_set_size(struct dp_packet *b, uint32_t v)
> * (and thus 'v') will always be <= UINT16_MAX; this means that there is
> no
> * loss of accuracy in assigning 'v' to 'data_len'.
> */
> -b->mbuf.data_len = (uint16_t)v; /* Current seg length. */
> +if (b->mbuf.nb_segs <= 1) {
> +b->mbuf.data_len = (uint16_t)v; /* Current seg length. */
> +} else {
> +/* For multi-seg packet, if it is resize, data_len should be
> + * adjusted by offset, this will happend in case of push or pop.
> + */
> +if (b->mbuf.pkt_len != 0) {
> +b->mbuf.data_len += v - b->mbuf.pkt_len;
> +}
> +}
> b->mbuf.pkt_len = v; /* Total length of all segments linked
> to
>* this segment. */ } @@
> -1092,6 +1101,20 @@ dp_packet_hwol_set_l4_len(struct dp_packet *b, int
> l4_len) {
> b->mbuf.l4_len = l4_len;
> }
> +
> +/* Set outer_l2_len for the packet 'b' */ static inline void
> +dp_packet_hwol_set_outer_l2_len(struct dp_packet *b, int
> +outer_l2_len) {
> +b->mbuf.outer_l2_len = outer_l2_len; }
> +
> +/* Set outer_l3_len for the packet 'b' */ static inline void
> +dp_packet_hwol_set_outer_l3_len(struct dp_packet *b, int
> +outer_l3_len) {
> +b->mbuf.outer_l3_len = outer_l3_len; }
> #else
> /* Mark packet 'b' for VXLAN TCP segmentation offloading. */ static
> inline void @@ -1125,6 +1148,18 @@ dp_packet_hwol_set_l4_len(struct
> dp_packet *b OVS_UNUSED,
>int l4_len OVS_UNUSED) { }
> +
> +/* Set outer_l2_len for the packet 'b' */ static inline void
> +dp_packet_hwol_set_outer_l2_len(struct dp_packet *b, int
> +outer_l2_len) { }
> +
> +/* Set outer_l3_len for the packet 'b' */ static inline void
> +dp_packet_hwol_set_outer_l3_len(struct dp_packet *b, int
> +outer_l3_len) { }
> #endif /* DPDK_NETDEV */
>
> static inline bool
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 888a45e..b6c57a6 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -25,6 +25,7 @@
> #include
> #include
> #include
> +#include
>
> /* Include rte_compat.h first to allow experimental API's needed for the
> * rte_meter.h rfc4115 functions. Once they are no longer marked as
> @@ -47,6 +48,7 @@
> #include
> #include
> #include
> +#include
>
> #include "cmap.h"
> #include "coverage.h"
> @@ -2184,6 +2186,8 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr,
> uint16_t ethertype)
> }
> }
>
> +#define UDP_VXLAN_ETH_HDR_SIZE 30
> +
> /* Prepare the packet for HWOL.
> * Return True