On Tue, Jan 14, 2020 at 03:41:57PM +0000, Stokes, Ian wrote: > > > On 1/9/2020 2:44 PM, Flavio Leitner wrote: > > Abbreviated as TSO, TCP Segmentation Offload is a feature which enables > > the network stack to delegate the TCP segmentation to the NIC reducing > > the per packet CPU overhead. > > > > A guest using vhostuser interface with TSO enabled can send TCP packets > > much bigger than the MTU, which saves CPU cycles normally used to break > > the packets down to MTU size and to calculate checksums. > > > > It also saves CPU cycles used to parse multiple packets/headers during > > the packet processing inside virtual switch. > > > > If the destination of the packet is another guest in the same host, then > > the same big packet can be sent through a vhostuser interface skipping > > the segmentation completely. However, if the destination is not local, > > the NIC hardware is instructed to do the TCP segmentation and checksum > > calculation. > > > > It is recommended to check if NIC hardware supports TSO before enabling > > the feature, which is off by default. For additional information please > > check the tso.rst document. > > Thansk for the patch Flavio. You've addressed my comments at least and I can > see that Ciara has tested the series. > > I think this will need to be rebased however as there has been a change to > netdev-linux to operate on batches rather than singe packets. Can I ask you > to rebase the series for these changes?
Ok, will do. fbl > > @Ilya: I believe Flavio has addressed your comments to date but not sure if > you have more? > > Thanks > Ian > > > > Signed-off-by: Flavio Leitner <f...@sysclose.org> > > --- > > Documentation/automake.mk | 1 + > > Documentation/topics/dpdk/index.rst | 1 + > > Documentation/topics/dpdk/tso.rst | 96 +++++++++ > > NEWS | 1 + > > lib/automake.mk | 2 + > > lib/conntrack.c | 29 ++- > > lib/dp-packet.h | 152 +++++++++++++- > > lib/ipf.c | 32 +-- > > lib/netdev-dpdk.c | 312 ++++++++++++++++++++++++---- > > lib/netdev-linux-private.h | 4 + > > lib/netdev-linux.c | 296 +++++++++++++++++++++++--- > > lib/netdev-provider.h | 10 + > > lib/netdev.c | 66 +++++- > > lib/tso.c | 54 +++++ > > lib/tso.h | 23 ++ > > vswitchd/bridge.c | 2 + > > vswitchd/vswitch.xml | 12 ++ > > 17 files changed, 1002 insertions(+), 91 deletions(-) > > create mode 100644 Documentation/topics/dpdk/tso.rst > > create mode 100644 lib/tso.c > > create mode 100644 lib/tso.h > > > > Changelog: > > - v3 > > * Improved the documentation. > > * Updated copyright year to 2020. > > * TSO offloaded msg now includes the netdev's name. > > * Added period at the end of all code comments. > > * Warn and drop encapsulation of TSO packets. > > * Fixed travis issue with restricted virtio types. > > * Fixed double headroom allocation in dpdk_copy_dp_packet_to_mbuf() > > which caused packet corruption. > > * Fixed netdev_dpdk_prep_hwol_packet() to unconditionally set > > PKT_TX_IP_CKSUM only for IPv4 packets. > > > > diff --git a/Documentation/automake.mk b/Documentation/automake.mk > > index f2ca17bad..284327edd 100644 > > --- a/Documentation/automake.mk > > +++ b/Documentation/automake.mk > > @@ -35,6 +35,7 @@ DOC_SOURCE = \ > > Documentation/topics/dpdk/index.rst \ > > Documentation/topics/dpdk/bridge.rst \ > > Documentation/topics/dpdk/jumbo-frames.rst \ > > + Documentation/topics/dpdk/tso.rst \ > > Documentation/topics/dpdk/memory.rst \ > > Documentation/topics/dpdk/pdump.rst \ > > Documentation/topics/dpdk/phy.rst \ > > diff --git a/Documentation/topics/dpdk/index.rst > > b/Documentation/topics/dpdk/index.rst > > index f2862ea70..400d56051 100644 > > --- a/Documentation/topics/dpdk/index.rst > > +++ b/Documentation/topics/dpdk/index.rst > > @@ -40,4 +40,5 @@ DPDK Support > > /topics/dpdk/qos > > /topics/dpdk/pdump > > /topics/dpdk/jumbo-frames > > + /topics/dpdk/tso > > /topics/dpdk/memory > > diff --git a/Documentation/topics/dpdk/tso.rst > > b/Documentation/topics/dpdk/tso.rst > > new file mode 100644 > > index 000000000..189c86480 > > --- /dev/null > > +++ b/Documentation/topics/dpdk/tso.rst > > @@ -0,0 +1,96 @@ > > +.. > > + Copyright 2020, Red Hat, Inc. > > + > > + Licensed under the Apache License, Version 2.0 (the "License"); you > > may > > + not use this file except in compliance with the License. You may > > obtain > > + a copy of the License at > > + > > + http://www.apache.org/licenses/LICENSE-2.0 > > + > > + Unless required by applicable law or agreed to in writing, software > > + distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT > > + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See > > the > > + License for the specific language governing permissions and > > limitations > > + under the License. > > + > > + Convention for heading levels in Open vSwitch documentation: > > + > > + ======= Heading 0 (reserved for the title in a document) > > + ------- Heading 1 > > + ~~~~~~~ Heading 2 > > + +++++++ Heading 3 > > + ''''''' Heading 4 > > + > > + Avoid deeper levels because they do not render well. > > + > > +======================== > > +Userspace Datapath - TSO > > +======================== > > + > > +**Note:** This feature is considered experimental. > > + > > +TCP Segmentation Offload (TSO) enables a network stack to delegate > > segmentation > > +of an oversized TCP segment to the underlying physical NIC. Offload of > > frame > > +segmentation achieves computational savings in the core, freeing up CPU > > cycles > > +for more useful work. > > + > > +A common use case for TSO is when using virtualization, where traffic > > that's > > +coming in from a VM can offload the TCP segmentation, thus avoiding the > > +fragmentation in software. Additionally, if the traffic is headed to a VM > > +within the same host further optimization can be expected. As the traffic > > never > > +leaves the machine, no MTU needs to be accounted for, and thus no > > segmentation > > +and checksum calculations are required, which saves yet more cycles. Only > > when > > +the traffic actually leaves the host the segmentation needs to happen, in > > which > > +case it will be performed by the egress NIC. Consult your controller's > > +datasheet for compatibility. Secondly, the NIC must have an associated DPDK > > +Poll Mode Driver (PMD) which supports `TSO`. For a list of features per > > PMD, > > +refer to the `DPDK documentation`__. > > + > > +__ https://doc.dpdk.org/guides/nics/overview.html > > + > > +Enabling TSO > > +~~~~~~~~~~~~ > > + > > +The TSO support may be enabled via a global config value ``tso-support``. > > +Setting this to ``true`` enables TSO support for all ports. > > + > > + $ ovs-vsctl set Open_vSwitch . other_config:tso-support=true > > + > > +The default value is ``false``. > > + > > +Changing ``tso-support`` requires restarting the daemon. > > + > > +When using :doc:`vHost User ports <vhost-user>`, TSO may be enabled as > > follows. > > + > > +`TSO` is enabled in OvS by the DPDK vHost User backend; when a new guest > > +connection is established, `TSO` is thus advertised to the guest as an > > +available feature: > > + > > +QEMU Command Line Parameter:: > > + > > + $ sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \ > > + ... > > + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,\ > > + csum=on,guest_csum=on,guest_tso4=on,guest_tso6=on\ > > + ... > > + > > +2. Ethtool. Assuming that the guest's OS also supports `TSO`, ethtool can > > be > > +used to enable same:: > > + > > + $ ethtool -K eth0 sg on # scatter-gather is a prerequisite for TSO > > + $ ethtool -K eth0 tso on > > + $ ethtool -k eth0 > > + > > +~~~~~~~~~~~ > > +Limitations > > +~~~~~~~~~~~ > > + > > +The current OvS userspace `TSO` implementation supports flat and VLAN > > networks > > +only (i.e. no support for `TSO` over tunneled connection [VxLAN, GRE, > > IPinIP, > > +etc.]). > > + > > +There is no software implementation of TSO, so all ports attached to the > > +datapath must support TSO or packets using that feature will be dropped > > +on ports without TSO support. That also means guests using vhost-user > > +in client mode will receive TSO packet regardless of TSO being enabled > > +or disabled within the guest. > > diff --git a/NEWS b/NEWS > > index 965facaf8..306c0493d 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -26,6 +26,7 @@ Post-v2.12.0 > > * DPDK ring ports (dpdkr) are deprecated and will be removed in next > > releases. > > * Add support for DPDK 19.11. > > + * Add experimental support for TSO. > > - RSTP: > > * The rstp_statistics column in Port table will only be updated every > > stats-update-interval configured in Open_vSwtich table. > > diff --git a/lib/automake.mk b/lib/automake.mk > > index ebf714501..94a1b4459 100644 > > --- a/lib/automake.mk > > +++ b/lib/automake.mk > > @@ -304,6 +304,8 @@ lib_libopenvswitch_la_SOURCES = \ > > lib/tnl-neigh-cache.h \ > > lib/tnl-ports.c \ > > lib/tnl-ports.h \ > > + lib/tso.c \ > > + lib/tso.h \ > > lib/netdev-native-tnl.c \ > > lib/netdev-native-tnl.h \ > > lib/token-bucket.c \ > > diff --git a/lib/conntrack.c b/lib/conntrack.c > > index b80080e72..679054b98 100644 > > --- a/lib/conntrack.c > > +++ b/lib/conntrack.c > > @@ -2022,7 +2022,8 @@ conn_key_extract(struct conntrack *ct, struct > > dp_packet *pkt, ovs_be16 dl_type, > > if (hwol_bad_l3_csum) { > > ok = false; > > } else { > > - bool hwol_good_l3_csum = dp_packet_ip_checksum_valid(pkt); > > + bool hwol_good_l3_csum = dp_packet_ip_checksum_valid(pkt) > > + || dp_packet_hwol_tx_ip_checksum(pkt); > > /* Validate the checksum only when hwol is not supported. */ > > ok = extract_l3_ipv4(&ctx->key, l3, dp_packet_l3_size(pkt), > > NULL, > > !hwol_good_l3_csum); > > @@ -2036,7 +2037,8 @@ conn_key_extract(struct conntrack *ct, struct > > dp_packet *pkt, ovs_be16 dl_type, > > if (ok) { > > bool hwol_bad_l4_csum = dp_packet_l4_checksum_bad(pkt); > > if (!hwol_bad_l4_csum) { > > - bool hwol_good_l4_csum = dp_packet_l4_checksum_valid(pkt); > > + bool hwol_good_l4_csum = dp_packet_l4_checksum_valid(pkt) > > + || > > dp_packet_hwol_tx_l4_checksum(pkt); > > /* Validate the checksum only when hwol is not supported. */ > > if (extract_l4(&ctx->key, l4, dp_packet_l4_size(pkt), > > &ctx->icmp_related, l3, !hwol_good_l4_csum, > > @@ -3237,8 +3239,11 @@ handle_ftp_ctl(struct conntrack *ct, const struct > > conn_lookup_ctx *ctx, > > } > > if (seq_skew) { > > ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew; > > - l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum, > > - l3_hdr->ip_tot_len, > > htons(ip_len)); > > + if (!dp_packet_hwol_tx_ip_checksum(pkt)) { > > + l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum, > > + l3_hdr->ip_tot_len, > > + htons(ip_len)); > > + } > > l3_hdr->ip_tot_len = htons(ip_len); > > } > > } > > @@ -3256,13 +3261,15 @@ handle_ftp_ctl(struct conntrack *ct, const struct > > conn_lookup_ctx *ctx, > > } > > th->tcp_csum = 0; > > - if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { > > - th->tcp_csum = packet_csum_upperlayer6(nh6, th, ctx->key.nw_proto, > > - dp_packet_l4_size(pkt)); > > - } else { > > - uint32_t tcp_csum = packet_csum_pseudoheader(l3_hdr); > > - th->tcp_csum = csum_finish( > > - csum_continue(tcp_csum, th, dp_packet_l4_size(pkt))); > > + if (!dp_packet_hwol_tx_l4_checksum(pkt)) { > > + if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { > > + th->tcp_csum = packet_csum_upperlayer6(nh6, th, > > ctx->key.nw_proto, > > + dp_packet_l4_size(pkt)); > > + } else { > > + uint32_t tcp_csum = packet_csum_pseudoheader(l3_hdr); > > + th->tcp_csum = csum_finish( > > + csum_continue(tcp_csum, th, dp_packet_l4_size(pkt))); > > + } > > } > > if (seq_skew) { > > diff --git a/lib/dp-packet.h b/lib/dp-packet.h > > index 133942155..d10a0416e 100644 > > --- a/lib/dp-packet.h > > +++ b/lib/dp-packet.h > > @@ -114,6 +114,8 @@ static inline void dp_packet_set_size(struct dp_packet > > *, uint32_t); > > static inline uint16_t dp_packet_get_allocated(const struct dp_packet *); > > static inline void dp_packet_set_allocated(struct dp_packet *, uint16_t); > > +void dp_packet_prepend_vnet_hdr(struct dp_packet *, int mtu); > > + > > void *dp_packet_resize_l2(struct dp_packet *, int increment); > > void *dp_packet_resize_l2_5(struct dp_packet *, int increment); > > static inline void *dp_packet_eth(const struct dp_packet *); > > @@ -456,7 +458,7 @@ dp_packet_init_specific(struct dp_packet *p) > > { > > /* This initialization is needed for packets that do not come from > > DPDK > > * interfaces, when vswitchd is built with --with-dpdk. */ > > - p->mbuf.tx_offload = p->mbuf.packet_type = 0; > > + p->mbuf.ol_flags = p->mbuf.tx_offload = p->mbuf.packet_type = 0; > > p->mbuf.nb_segs = 1; > > p->mbuf.next = NULL; > > } > > @@ -519,6 +521,80 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t > > s) > > b->mbuf.buf_len = s; > > } > > +static inline bool > > +dp_packet_hwol_is_tso(const struct dp_packet *b) > > +{ > > + return (b->mbuf.ol_flags & (PKT_TX_TCP_SEG | PKT_TX_L4_MASK)) > > + ? true > > + : false; > > +} > > + > > +static inline bool > > +dp_packet_hwol_is_ipv4(const struct dp_packet *b) > > +{ > > + return b->mbuf.ol_flags & PKT_TX_IPV4 ? true : false; > > +} > > + > > +static inline uint64_t > > +dp_packet_hwol_l4_mask(const struct dp_packet *b) > > +{ > > + return b->mbuf.ol_flags & PKT_TX_L4_MASK; > > +} > > + > > +static inline bool > > +dp_packet_hwol_l4_is_tcp(const struct dp_packet *b) > > +{ > > + return (b->mbuf.ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM > > + ? true > > + : false; > > +} > > + > > +static inline bool > > +dp_packet_hwol_l4_is_udp(struct dp_packet *b) > > +{ > > + return (b->mbuf.ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM > > + ? true > > + : false; > > +} > > + > > +static inline bool > > +dp_packet_hwol_l4_is_sctp(struct dp_packet *b) > > +{ > > + return (b->mbuf.ol_flags & PKT_TX_L4_MASK) == PKT_TX_SCTP_CKSUM > > + ? true > > + : false; > > +} > > + > > +static inline void > > +dp_packet_hwol_set_tx_ipv4(struct dp_packet *b) { > > + b->mbuf.ol_flags |= PKT_TX_IPV4; > > +} > > + > > +static inline void > > +dp_packet_hwol_set_tx_ipv6(struct dp_packet *b) { > > + b->mbuf.ol_flags |= PKT_TX_IPV6; > > +} > > + > > +static inline void > > +dp_packet_hwol_set_csum_tcp(struct dp_packet *b) { > > + b->mbuf.ol_flags |= PKT_TX_TCP_CKSUM; > > +} > > + > > +static inline void > > +dp_packet_hwol_set_csum_udp(struct dp_packet *b) { > > + b->mbuf.ol_flags |= PKT_TX_UDP_CKSUM; > > +} > > + > > +static inline void > > +dp_packet_hwol_set_csum_sctp(struct dp_packet *b) { > > + b->mbuf.ol_flags |= PKT_TX_SCTP_CKSUM; > > +} > > + > > +static inline void > > +dp_packet_hwol_set_tcp_seg(struct dp_packet *b) { > > + b->mbuf.ol_flags |= PKT_TX_TCP_SEG; > > +} > > + > > /* Returns the RSS hash of the packet 'p'. Note that the returned value > > is > > * correct only if 'dp_packet_rss_valid(p)' returns true */ > > static inline uint32_t > > @@ -648,6 +724,66 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t > > s) > > b->allocated_ = s; > > } > > +static inline bool > > +dp_packet_hwol_is_tso(const struct dp_packet *b OVS_UNUSED) > > +{ > > + return false; > > +} > > + > > +static inline bool > > +dp_packet_hwol_is_ipv4(const struct dp_packet *b OVS_UNUSED) > > +{ > > + return false; > > +} > > + > > +static inline uint64_t > > +dp_packet_hwol_l4_mask(const struct dp_packet *b OVS_UNUSED) > > +{ > > + return 0; > > +} > > + > > +static inline bool > > +dp_packet_hwol_l4_is_tcp(const struct dp_packet *b OVS_UNUSED) > > +{ > > + return false; > > +} > > + > > +static inline bool > > +dp_packet_hwol_l4_is_udp(const struct dp_packet *b OVS_UNUSED) > > +{ > > + return false; > > +} > > + > > +static inline bool > > +dp_packet_hwol_l4_is_sctp(const struct dp_packet *b OVS_UNUSED) > > +{ > > + return false; > > +} > > + > > +static inline void > > +dp_packet_hwol_set_tx_ipv4(struct dp_packet *b OVS_UNUSED) { > > +} > > + > > +static inline void > > +dp_packet_hwol_set_tx_ipv6(struct dp_packet *b OVS_UNUSED) { > > +} > > + > > +static inline void > > +dp_packet_hwol_set_csum_tcp(struct dp_packet *b OVS_UNUSED) { > > +} > > + > > +static inline void > > +dp_packet_hwol_set_csum_udp(struct dp_packet *b OVS_UNUSED) { > > +} > > + > > +static inline void > > +dp_packet_hwol_set_csum_sctp(struct dp_packet *b OVS_UNUSED) { > > +} > > + > > +static inline void > > +dp_packet_hwol_set_tcp_seg(struct dp_packet *b OVS_UNUSED) { > > +} > > + > > /* Returns the RSS hash of the packet 'p'. Note that the returned value > > is > > * correct only if 'dp_packet_rss_valid(p)' returns true */ > > static inline uint32_t > > @@ -939,6 +1075,20 @@ dp_packet_batch_reset_cutlen(struct dp_packet_batch > > *batch) > > } > > } > > +static inline bool > > +dp_packet_hwol_tx_ip_checksum(const struct dp_packet *p) > > +{ > > + > > + return dp_packet_hwol_l4_mask(p) ? true : false; > > +} > > + > > +static inline bool > > +dp_packet_hwol_tx_l4_checksum(const struct dp_packet *p) > > +{ > > + > > + return dp_packet_hwol_l4_mask(p) ? true : false; > > +} > > + > > #ifdef __cplusplus > > } > > #endif > > diff --git a/lib/ipf.c b/lib/ipf.c > > index 45c489122..0f43593a2 100644 > > --- a/lib/ipf.c > > +++ b/lib/ipf.c > > @@ -433,9 +433,11 @@ ipf_reassemble_v4_frags(struct ipf_list *ipf_list) > > len += rest_len; > > l3 = dp_packet_l3(pkt); > > ovs_be16 new_ip_frag_off = l3->ip_frag_off & > > ~htons(IP_MORE_FRAGMENTS); > > - l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_frag_off, > > - new_ip_frag_off); > > - l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_tot_len, htons(len)); > > + if (!dp_packet_hwol_tx_ip_checksum(pkt)) { > > + l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_frag_off, > > + new_ip_frag_off); > > + l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_tot_len, > > htons(len)); > > + } > > l3->ip_tot_len = htons(len); > > l3->ip_frag_off = new_ip_frag_off; > > dp_packet_set_l2_pad_size(pkt, 0); > > @@ -606,6 +608,7 @@ ipf_is_valid_v4_frag(struct ipf *ipf, struct dp_packet > > *pkt) > > } > > if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(pkt) > > + && !dp_packet_hwol_tx_ip_checksum(pkt) > > && csum(l3, ip_hdr_len) != 0)) { > > goto invalid_pkt; > > } > > @@ -1181,16 +1184,21 @@ ipf_post_execute_reass_pkts(struct ipf *ipf, > > } else { > > struct ip_header *l3_frag = dp_packet_l3(frag_0->pkt); > > struct ip_header *l3_reass = dp_packet_l3(pkt); > > - ovs_be32 reass_ip = > > get_16aligned_be32(&l3_reass->ip_src); > > - ovs_be32 frag_ip = > > get_16aligned_be32(&l3_frag->ip_src); > > - l3_frag->ip_csum = recalc_csum32(l3_frag->ip_csum, > > - frag_ip, reass_ip); > > - l3_frag->ip_src = l3_reass->ip_src; > > + if (!dp_packet_hwol_tx_ip_checksum(frag_0->pkt)) { > > + ovs_be32 reass_ip = > > + get_16aligned_be32(&l3_reass->ip_src); > > + ovs_be32 frag_ip = > > + get_16aligned_be32(&l3_frag->ip_src); > > + > > + l3_frag->ip_csum = recalc_csum32(l3_frag->ip_csum, > > + frag_ip, > > reass_ip); > > + reass_ip = get_16aligned_be32(&l3_reass->ip_dst); > > + frag_ip = get_16aligned_be32(&l3_frag->ip_dst); > > + l3_frag->ip_csum = recalc_csum32(l3_frag->ip_csum, > > + frag_ip, > > reass_ip); > > + } > > - reass_ip = get_16aligned_be32(&l3_reass->ip_dst); > > - frag_ip = get_16aligned_be32(&l3_frag->ip_dst); > > - l3_frag->ip_csum = recalc_csum32(l3_frag->ip_csum, > > - frag_ip, reass_ip); > > + l3_frag->ip_src = l3_reass->ip_src; > > l3_frag->ip_dst = l3_reass->ip_dst; > > } > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 5e09786ac..2de60aa3f 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -64,6 +64,7 @@ > > #include "smap.h" > > #include "sset.h" > > #include "timeval.h" > > +#include "tso.h" > > #include "unaligned.h" > > #include "unixctl.h" > > #include "util.h" > > @@ -360,7 +361,8 @@ struct ingress_policer { > > enum dpdk_hw_ol_features { > > NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, > > NETDEV_RX_HW_CRC_STRIP = 1 << 1, > > - NETDEV_RX_HW_SCATTER = 1 << 2 > > + NETDEV_RX_HW_SCATTER = 1 << 2, > > + NETDEV_TX_TSO_OFFLOAD = 1 << 3, > > }; > > /* > > @@ -942,6 +944,12 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int > > n_rxq, int n_txq) > > conf.rxmode.offloads |= DEV_RX_OFFLOAD_KEEP_CRC; > > } > > + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { > > + conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_TSO; > > + conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_CKSUM; > > + conf.txmode.offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM; > > + } > > + > > /* Limit configured rss hash functions to only those supported > > * by the eth device. */ > > conf.rx_adv_conf.rss_conf.rss_hf &= info.flow_type_rss_offloads; > > @@ -1043,6 +1051,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) > > uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | > > DEV_RX_OFFLOAD_TCP_CKSUM | > > DEV_RX_OFFLOAD_IPV4_CKSUM; > > + uint32_t tx_tso_offload_capa = DEV_TX_OFFLOAD_TCP_TSO | > > + DEV_TX_OFFLOAD_TCP_CKSUM | > > + DEV_TX_OFFLOAD_IPV4_CKSUM; > > rte_eth_dev_info_get(dev->port_id, &info); > > @@ -1069,6 +1080,14 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) > > dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER; > > } > > + if (info.tx_offload_capa & tx_tso_offload_capa) { > > + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; > > + } else { > > + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; > > + VLOG_WARN("Tx TSO offload is not supported on %s port " > > + DPDK_PORT_ID_FMT, netdev_get_name(&dev->up), > > dev->port_id); > > + } > > + > > n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); > > n_txq = MIN(info.max_tx_queues, dev->up.n_txq); > > @@ -1319,14 +1338,16 @@ netdev_dpdk_vhost_construct(struct netdev *netdev) > > goto out; > > } > > - err = rte_vhost_driver_disable_features(dev->vhost_id, > > - 1ULL << VIRTIO_NET_F_HOST_TSO4 > > - | 1ULL << VIRTIO_NET_F_HOST_TSO6 > > - | 1ULL << VIRTIO_NET_F_CSUM); > > - if (err) { > > - VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user " > > - "port: %s\n", name); > > - goto out; > > + if (!tso_enabled()) { > > + err = rte_vhost_driver_disable_features(dev->vhost_id, > > + 1ULL << VIRTIO_NET_F_HOST_TSO4 > > + | 1ULL << VIRTIO_NET_F_HOST_TSO6 > > + | 1ULL << VIRTIO_NET_F_CSUM); > > + if (err) { > > + VLOG_ERR("rte_vhost_driver_disable_features failed for vhost > > user " > > + "port: %s\n", name); > > + goto out; > > + } > > } > > err = rte_vhost_driver_start(dev->vhost_id); > > @@ -1661,6 +1682,11 @@ netdev_dpdk_get_config(const struct netdev *netdev, > > struct smap *args) > > } else { > > smap_add(args, "rx_csum_offload", "false"); > > } > > + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { > > + smap_add(args, "tx_tso_offload", "true"); > > + } else { > > + smap_add(args, "tx_tso_offload", "false"); > > + } > > smap_add(args, "lsc_interrupt_mode", > > dev->lsc_interrupt_mode ? "true" : "false"); > > } > > @@ -2088,6 +2114,67 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq) > > rte_free(rx); > > } > > +/* Prepare the packet for HWOL. > > + * Return True if the packet is OK to continue. */ > > +static bool > > +netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf > > *mbuf) > > +{ > > + struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); > > + > > + if (mbuf->ol_flags & PKT_TX_L4_MASK) { > > + mbuf->l2_len = (char *)dp_packet_l3(pkt) - (char > > *)dp_packet_eth(pkt); > > + mbuf->l3_len = (char *)dp_packet_l4(pkt) - (char > > *)dp_packet_l3(pkt); > > + mbuf->outer_l2_len = 0; > > + mbuf->outer_l3_len = 0; > > + } > > + > > + if (mbuf->ol_flags & PKT_TX_TCP_SEG) { > > + struct tcp_header *th = dp_packet_l4(pkt); > > + > > + if (!th) { > > + VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header" > > + " pkt len: %"PRIu32"", dev->up.name, > > mbuf->pkt_len); > > + return false; > > + } > > + > > + mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4; > > + mbuf->ol_flags |= PKT_TX_TCP_CKSUM; > > + mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; > > + > > + if (mbuf->ol_flags & PKT_TX_IPV4) { > > + mbuf->ol_flags |= PKT_TX_IP_CKSUM; > > + } > > + } > > + return true; > > +} > > + > > +/* Prepare a batch for HWOL. > > + * Return the number of good packets in the batch. */ > > +static int > > +netdev_dpdk_prep_hwol_batch(struct netdev_dpdk *dev, struct rte_mbuf > > **pkts, > > + int pkt_cnt) > > +{ > > + int i = 0; > > + int cnt = 0; > > + struct rte_mbuf *pkt; > > + > > + /* Prepare and filter bad HWOL packets. */ > > + for (i = 0; i < pkt_cnt; i++) { > > + pkt = pkts[i]; > > + if (!netdev_dpdk_prep_hwol_packet(dev, pkt)) { > > + rte_pktmbuf_free(pkt); > > + continue; > > + } > > + > > + if (OVS_UNLIKELY(i != cnt)) { > > + pkts[cnt] = pkt; > > + } > > + cnt++; > > + } > > + > > + return cnt; > > +} > > + > > /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes > > ownership of > > * 'pkts', even in case of failure. > > * > > @@ -2097,11 +2184,22 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, > > int qid, > > struct rte_mbuf **pkts, int cnt) > > { > > uint32_t nb_tx = 0; > > + uint16_t nb_tx_prep = cnt; > > + > > + if (tso_enabled()) { > > + nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); > > + if (nb_tx_prep != cnt) { > > + VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " > > + "Only %u/%u are valid: %s", dev->up.name, > > nb_tx_prep, > > + cnt, rte_strerror(rte_errno)); > > + } > > + } > > - while (nb_tx != cnt) { > > + while (nb_tx != nb_tx_prep) { > > uint32_t ret; > > - ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - > > nb_tx); > > + ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, > > + nb_tx_prep - nb_tx); > > if (!ret) { > > break; > > } > > @@ -2386,11 +2484,14 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk > > *dev, struct rte_mbuf **pkts, > > int cnt = 0; > > struct rte_mbuf *pkt; > > + /* Filter oversized packets, unless are marked for TSO. */ > > for (i = 0; i < pkt_cnt; i++) { > > pkt = pkts[i]; > > - if (OVS_UNLIKELY(pkt->pkt_len > dev->max_packet_len)) { > > - VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " max_packet_len > > %d", > > - dev->up.name, pkt->pkt_len, dev->max_packet_len); > > + if (OVS_UNLIKELY((pkt->pkt_len > dev->max_packet_len) > > + && !(pkt->ol_flags & PKT_TX_TCP_SEG))) { > > + VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " > > + "max_packet_len %d", dev->up.name, pkt->pkt_len, > > + dev->max_packet_len); > > rte_pktmbuf_free(pkt); > > continue; > > } > > @@ -2442,7 +2543,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int > > qid, > > struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; > > struct netdev_dpdk_sw_stats sw_stats_add; > > unsigned int n_packets_to_free = cnt; > > - unsigned int total_packets = cnt; > > + unsigned int total_packets; > > int i, retries = 0; > > int max_retries = VHOST_ENQ_RETRY_MIN; > > int vid = netdev_dpdk_get_vid(dev); > > @@ -2462,7 +2563,8 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int > > qid, > > rte_spinlock_lock(&dev->tx_q[qid].tx_lock); > > } > > - cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt); > > + total_packets = netdev_dpdk_prep_hwol_batch(dev, cur_pkts, cnt); > > + cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, total_packets); > > sw_stats_add.tx_mtu_exceeded_drops = total_packets - cnt; > > /* Check has QoS has been configured for the netdev */ > > @@ -2511,6 +2613,121 @@ out: > > } > > } > > +static void > > +netdev_dpdk_extbuf_free(void *addr OVS_UNUSED, void *opaque) > > +{ > > + rte_free(opaque); > > +} > > + > > +static struct rte_mbuf * > > +dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt, uint32_t data_len) > > +{ > > + uint32_t total_len = RTE_PKTMBUF_HEADROOM + data_len; > > + struct rte_mbuf_ext_shared_info *shinfo = NULL; > > + uint16_t buf_len; > > + void *buf; > > + > > + if (rte_pktmbuf_tailroom(pkt) >= sizeof(*shinfo)) { > > + shinfo = rte_pktmbuf_mtod(pkt, struct rte_mbuf_ext_shared_info *); > > + } else { > > + total_len += sizeof(*shinfo) + sizeof(uintptr_t); > > + total_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); > > + } > > + > > + if (unlikely(total_len > UINT16_MAX)) { > > + VLOG_ERR("Can't copy packet: too big %u", total_len); > > + return NULL; > > + } > > + > > + buf_len = total_len; > > + buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); > > + if (unlikely(buf == NULL)) { > > + VLOG_ERR("Failed to allocate memory using rte_malloc: %u", > > buf_len); > > + return NULL; > > + } > > + > > + /* Initialize shinfo. */ > > + if (shinfo) { > > + shinfo->free_cb = netdev_dpdk_extbuf_free; > > + shinfo->fcb_opaque = buf; > > + rte_mbuf_ext_refcnt_set(shinfo, 1); > > + } else { > > + shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, > > + > > netdev_dpdk_extbuf_free, > > + buf); > > + if (unlikely(shinfo == NULL)) { > > + rte_free(buf); > > + VLOG_ERR("Failed to initialize shared info for mbuf while " > > + "attempting to attach an external buffer."); > > + return NULL; > > + } > > + } > > + > > + rte_pktmbuf_attach_extbuf(pkt, buf, rte_malloc_virt2iova(buf), buf_len, > > + shinfo); > > + rte_pktmbuf_reset_headroom(pkt); > > + > > + return pkt; > > +} > > + > > +static struct rte_mbuf * > > +dpdk_pktmbuf_alloc(struct rte_mempool *mp, uint32_t data_len) > > +{ > > + struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); > > + > > + if (OVS_UNLIKELY(!pkt)) { > > + return NULL; > > + } > > + > > + dp_packet_init_specific((struct dp_packet *)pkt); > > + if (rte_pktmbuf_tailroom(pkt) >= data_len) { > > + return pkt; > > + } > > + > > + if (dpdk_pktmbuf_attach_extbuf(pkt, data_len)) { > > + return pkt; > > + } > > + > > + rte_pktmbuf_free(pkt); > > + > > + return NULL; > > +} > > + > > +static struct dp_packet * > > +dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet > > *pkt_orig) > > +{ > > + struct rte_mbuf *mbuf_dest; > > + struct dp_packet *pkt_dest; > > + uint32_t pkt_len; > > + > > + pkt_len = dp_packet_size(pkt_orig); > > + mbuf_dest = dpdk_pktmbuf_alloc(mp, pkt_len); > > + if (OVS_UNLIKELY(mbuf_dest == NULL)) { > > + return NULL; > > + } > > + > > + pkt_dest = CONTAINER_OF(mbuf_dest, struct dp_packet, mbuf); > > + memcpy(dp_packet_data(pkt_dest), dp_packet_data(pkt_orig), pkt_len); > > + dp_packet_set_size(pkt_dest, pkt_len); > > + > > + mbuf_dest->tx_offload = pkt_orig->mbuf.tx_offload; > > + mbuf_dest->packet_type = pkt_orig->mbuf.packet_type; > > + mbuf_dest->ol_flags |= (pkt_orig->mbuf.ol_flags & > > + ~(EXT_ATTACHED_MBUF | IND_ATTACHED_MBUF)); > > + > > + memcpy(&pkt_dest->l2_pad_size, &pkt_orig->l2_pad_size, > > + sizeof(struct dp_packet) - offsetof(struct dp_packet, > > l2_pad_size)); > > + > > + if (mbuf_dest->ol_flags & PKT_TX_L4_MASK) { > > + mbuf_dest->l2_len = (char *)dp_packet_l3(pkt_dest) > > + - (char *)dp_packet_eth(pkt_dest); > > + mbuf_dest->l3_len = (char *)dp_packet_l4(pkt_dest) > > + - (char *) dp_packet_l3(pkt_dest); > > + } > > + > > + return pkt_dest; > > +} > > + > > /* Tx function. Transmit packets indefinitely */ > > static void > > dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch > > *batch) > > @@ -2524,7 +2741,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, > > struct dp_packet_batch *batch) > > enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST }; > > #endif > > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > - struct rte_mbuf *pkts[PKT_ARRAY_SIZE]; > > + struct dp_packet *pkts[PKT_ARRAY_SIZE]; > > struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats; > > uint32_t cnt = batch_cnt; > > uint32_t dropped = 0; > > @@ -2545,34 +2762,30 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, > > struct dp_packet_batch *batch) > > struct dp_packet *packet = batch->packets[i]; > > uint32_t size = dp_packet_size(packet); > > - if (OVS_UNLIKELY(size > dev->max_packet_len)) { > > - VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", > > - size, dev->max_packet_len); > > - > > + if (size > dev->max_packet_len > > + && !(packet->mbuf.ol_flags & PKT_TX_TCP_SEG)) { > > + VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", size, > > + dev->max_packet_len); > > mtu_drops++; > > continue; > > } > > - pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp); > > + pkts[txcnt] = dpdk_copy_dp_packet_to_mbuf(dev->dpdk_mp->mp, > > packet); > > if (OVS_UNLIKELY(!pkts[txcnt])) { > > dropped = cnt - i; > > break; > > } > > - /* We have to do a copy for now */ > > - memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), > > - dp_packet_data(packet), size); > > - dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); > > - > > txcnt++; > > } > > if (OVS_LIKELY(txcnt)) { > > if (dev->type == DPDK_DEV_VHOST) { > > - __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) > > pkts, > > - txcnt); > > + __netdev_dpdk_vhost_send(netdev, qid, pkts, txcnt); > > } else { > > - tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt); > > + tx_failure += netdev_dpdk_eth_tx_burst(dev, qid, > > + (struct rte_mbuf > > **)pkts, > > + txcnt); > > } > > } > > @@ -2630,6 +2843,7 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, > > int batch_cnt = dp_packet_batch_size(batch); > > struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets; > > + batch_cnt = netdev_dpdk_prep_hwol_batch(dev, pkts, batch_cnt); > > tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt); > > mtu_drops = batch_cnt - tx_cnt; > > qos_drops = tx_cnt; > > @@ -4345,6 +4559,12 @@ netdev_dpdk_reconfigure(struct netdev *netdev) > > rte_free(dev->tx_q); > > err = dpdk_eth_dev_init(dev); > > + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { > > + netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; > > + netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; > > + netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; > > + } > > + > > dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); > > if (!dev->tx_q) { > > err = ENOMEM; > > @@ -4374,6 +4594,11 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk > > *dev) > > dev->tx_q[0].map = 0; > > } > > + if (tso_enabled()) { > > + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; > > + VLOG_DBG("%s: TSO enabled on vhost port", > > netdev_get_name(&dev->up)); > > + } > > + > > netdev_dpdk_remap_txqs(dev); > > err = netdev_dpdk_mempool_configure(dev); > > @@ -4446,6 +4671,11 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev > > *netdev) > > vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > > } > > + /* Enable External Buffers if TCP Segmentation Offload is enabled. > > */ > > + if (tso_enabled()) { > > + vhost_flags |= RTE_VHOST_USER_EXTBUF_SUPPORT; > > + } > > + > > err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); > > if (err) { > > VLOG_ERR("vhost-user device setup failure for device %s\n", > > @@ -4470,14 +4700,20 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev > > *netdev) > > goto unlock; > > } > > - err = rte_vhost_driver_disable_features(dev->vhost_id, > > - 1ULL << VIRTIO_NET_F_HOST_TSO4 > > - | 1ULL << VIRTIO_NET_F_HOST_TSO6 > > - | 1ULL << VIRTIO_NET_F_CSUM); > > - if (err) { > > - VLOG_ERR("rte_vhost_driver_disable_features failed for vhost > > user " > > - "client port: %s\n", dev->up.name); > > - goto unlock; > > + if (tso_enabled()) { > > + netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; > > + netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; > > + netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; > > + } else { > > + err = rte_vhost_driver_disable_features(dev->vhost_id, > > + 1ULL << VIRTIO_NET_F_HOST_TSO4 > > + | 1ULL << VIRTIO_NET_F_HOST_TSO6 > > + | 1ULL << VIRTIO_NET_F_CSUM); > > + if (err) { > > + VLOG_ERR("rte_vhost_driver_disable_features failed for " > > + "vhost user client port: %s\n", dev->up.name); > > + goto unlock; > > + } > > } > > err = rte_vhost_driver_start(dev->vhost_id); > > diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h > > index f08159aa7..102548db7 100644 > > --- a/lib/netdev-linux-private.h > > +++ b/lib/netdev-linux-private.h > > @@ -37,10 +37,14 @@ > > struct netdev; > > +#define LINUX_RXQ_TSO_MAX_LEN 65536 > > + > > struct netdev_rxq_linux { > > struct netdev_rxq up; > > bool is_tap; > > int fd; > > + char *bufaux; /* Extra buffer to recv TSO pkt. */ > > + int bufaux_len; /* Extra buffer length. */ > > }; > > int netdev_linux_construct(struct netdev *); > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > > index 8a62f9d74..604cb6913 100644 > > --- a/lib/netdev-linux.c > > +++ b/lib/netdev-linux.c > > @@ -29,16 +29,18 @@ > > #include <linux/filter.h> > > #include <linux/gen_stats.h> > > #include <linux/if_ether.h> > > +#include <linux/if_packet.h> > > #include <linux/if_tun.h> > > #include <linux/types.h> > > #include <linux/ethtool.h> > > #include <linux/mii.h> > > #include <linux/rtnetlink.h> > > #include <linux/sockios.h> > > +#include <linux/virtio_net.h> > > #include <sys/ioctl.h> > > #include <sys/socket.h> > > +#include <sys/uio.h> > > #include <sys/utsname.h> > > -#include <netpacket/packet.h> > > #include <net/if.h> > > #include <net/if_arp.h> > > #include <net/route.h> > > @@ -72,6 +74,7 @@ > > #include "socket-util.h" > > #include "sset.h" > > #include "tc.h" > > +#include "tso.h" > > #include "timer.h" > > #include "unaligned.h" > > #include "openvswitch/vlog.h" > > @@ -501,6 +504,8 @@ static struct vlog_rate_limit rl = > > VLOG_RATE_LIMIT_INIT(5, 20); > > * changes in the device miimon status, so we can use atomic_count. */ > > static atomic_count miimon_cnt = ATOMIC_COUNT_INIT(0); > > +static int netdev_linux_parse_vnet_hdr(struct dp_packet *b); > > +static void netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu); > > static int netdev_linux_do_ethtool(const char *name, struct ethtool_cmd *, > > int cmd, const char *cmd_name); > > static int get_flags(const struct netdev *, unsigned int *flags); > > @@ -902,6 +907,13 @@ netdev_linux_common_construct(struct netdev *netdev_) > > /* The device could be in the same network namespace or in another > > one. */ > > netnsid_unset(&netdev->netnsid); > > ovs_mutex_init(&netdev->mutex); > > + > > + if (tso_enabled()) { > > + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; > > + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; > > + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; > > + } > > + > > return 0; > > } > > @@ -961,6 +973,10 @@ netdev_linux_construct_tap(struct netdev *netdev_) > > /* Create tap device. */ > > get_flags(&netdev->up, &netdev->ifi_flags); > > ifr.ifr_flags = IFF_TAP | IFF_NO_PI; > > + if (tso_enabled()) { > > + ifr.ifr_flags |= IFF_VNET_HDR; > > + } > > + > > ovs_strzcpy(ifr.ifr_name, name, sizeof ifr.ifr_name); > > if (ioctl(netdev->tap_fd, TUNSETIFF, &ifr) == -1) { > > VLOG_WARN("%s: creating tap device failed: %s", name, > > @@ -1024,6 +1040,13 @@ static struct netdev_rxq * > > netdev_linux_rxq_alloc(void) > > { > > struct netdev_rxq_linux *rx = xzalloc(sizeof *rx); > > + if (tso_enabled()) { > > + rx->bufaux = xmalloc(LINUX_RXQ_TSO_MAX_LEN); > > + if (rx->bufaux) { > > + rx->bufaux_len = LINUX_RXQ_TSO_MAX_LEN; > > + } > > + } > > + > > return &rx->up; > > } > > @@ -1069,6 +1092,17 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_) > > goto error; > > } > > + if (tso_enabled()) { > > + error = setsockopt(rx->fd, SOL_PACKET, PACKET_VNET_HDR, &val, > > + sizeof val); > > + if (error) { > > + error = errno; > > + VLOG_ERR("%s: failed to enable vnet hdr in txq raw socket: > > %s", > > + netdev_get_name(netdev_), ovs_strerror(errno)); > > + goto error; > > + } > > + } > > + > > /* Set non-blocking mode. */ > > error = set_nonblocking(rx->fd); > > if (error) { > > @@ -1123,6 +1157,8 @@ netdev_linux_rxq_destruct(struct netdev_rxq *rxq_) > > if (!rx->is_tap) { > > close(rx->fd); > > } > > + > > + free(rx->bufaux); > > } > > static void > > @@ -1152,11 +1188,13 @@ auxdata_has_vlan_tci(const struct tpacket_auxdata > > *aux) > > } > > static int > > -netdev_linux_rxq_recv_sock(int fd, struct dp_packet *buffer) > > +netdev_linux_rxq_recv_sock(int fd, char *bufaux, int bufaux_len, > > + struct dp_packet *buffer) > > { > > - size_t size; > > + size_t std_len; > > + size_t total_len; > > ssize_t retval; > > - struct iovec iov; > > + struct iovec iov[2]; > > struct cmsghdr *cmsg; > > union { > > struct cmsghdr cmsg; > > @@ -1166,14 +1204,17 @@ netdev_linux_rxq_recv_sock(int fd, struct dp_packet > > *buffer) > > /* Reserve headroom for a single VLAN tag */ > > dp_packet_reserve(buffer, VLAN_HEADER_LEN); > > - size = dp_packet_tailroom(buffer); > > + std_len = dp_packet_tailroom(buffer); > > + total_len = std_len + bufaux_len; > > - iov.iov_base = dp_packet_data(buffer); > > - iov.iov_len = size; > > + iov[0].iov_base = dp_packet_data(buffer); > > + iov[0].iov_len = std_len; > > + iov[1].iov_base = bufaux; > > + iov[1].iov_len = bufaux_len; > > msgh.msg_name = NULL; > > msgh.msg_namelen = 0; > > - msgh.msg_iov = &iov; > > - msgh.msg_iovlen = 1; > > + msgh.msg_iov = iov; > > + msgh.msg_iovlen = 2; > > msgh.msg_control = &cmsg_buffer; > > msgh.msg_controllen = sizeof cmsg_buffer; > > msgh.msg_flags = 0; > > @@ -1184,11 +1225,26 @@ netdev_linux_rxq_recv_sock(int fd, struct dp_packet > > *buffer) > > if (retval < 0) { > > return errno; > > - } else if (retval > size) { > > + } else if (retval > total_len) { > > return EMSGSIZE; > > } > > - dp_packet_set_size(buffer, dp_packet_size(buffer) + retval); > > + if (retval > std_len) { > > + /* Build a single linear TSO packet. */ > > + size_t extra_len = retval - std_len; > > + > > + dp_packet_set_size(buffer, dp_packet_size(buffer) + std_len); > > + dp_packet_prealloc_tailroom(buffer, extra_len); > > + memcpy(dp_packet_tail(buffer), bufaux, extra_len); > > + dp_packet_set_size(buffer, dp_packet_size(buffer) + extra_len); > > + } else { > > + dp_packet_set_size(buffer, dp_packet_size(buffer) + retval); > > + } > > + > > + if (tso_enabled() && netdev_linux_parse_vnet_hdr(buffer)) { > > + VLOG_WARN_RL(&rl, "Invalid virtio net header"); > > + return EINVAL; > > + } > > for (cmsg = CMSG_FIRSTHDR(&msgh); cmsg; cmsg = CMSG_NXTHDR(&msgh, > > cmsg)) { > > const struct tpacket_auxdata *aux; > > @@ -1221,20 +1277,44 @@ netdev_linux_rxq_recv_sock(int fd, struct dp_packet > > *buffer) > > } > > static int > > -netdev_linux_rxq_recv_tap(int fd, struct dp_packet *buffer) > > +netdev_linux_rxq_recv_tap(int fd, char *bufaux, int bufaux_len, > > + struct dp_packet *buffer) > > { > > ssize_t retval; > > - size_t size = dp_packet_tailroom(buffer); > > + size_t std_len; > > + struct iovec iov[2]; > > + > > + std_len = dp_packet_tailroom(buffer); > > + iov[0].iov_base = dp_packet_data(buffer); > > + iov[0].iov_len = std_len; > > + iov[1].iov_base = bufaux; > > + iov[1].iov_len = bufaux_len; > > do { > > - retval = read(fd, dp_packet_data(buffer), size); > > + retval = readv(fd, iov, 2); > > } while (retval < 0 && errno == EINTR); > > if (retval < 0) { > > return errno; > > } > > - dp_packet_set_size(buffer, dp_packet_size(buffer) + retval); > > + if (retval > std_len) { > > + /* Build a single linear TSO packet. */ > > + size_t extra_len = retval - std_len; > > + > > + dp_packet_set_size(buffer, dp_packet_size(buffer) + std_len); > > + dp_packet_prealloc_tailroom(buffer, extra_len); > > + memcpy(dp_packet_tail(buffer), bufaux, extra_len); > > + dp_packet_set_size(buffer, dp_packet_size(buffer) + extra_len); > > + } else { > > + dp_packet_set_size(buffer, dp_packet_size(buffer) + retval); > > + } > > + > > + if (tso_enabled() && netdev_linux_parse_vnet_hdr(buffer)) { > > + VLOG_WARN_RL(&rl, "Invalid virtio net header"); > > + return EINVAL; > > + } > > + > > return 0; > > } > > @@ -1245,6 +1325,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct > > dp_packet_batch *batch, > > struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_); > > struct netdev *netdev = rx->up.netdev; > > struct dp_packet *buffer; > > + size_t buffer_len; > > ssize_t retval; > > int mtu; > > @@ -1252,12 +1333,18 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, > > struct dp_packet_batch *batch, > > mtu = ETH_PAYLOAD_MAX; > > } > > + buffer_len = VLAN_ETH_HEADER_LEN + mtu; > > + if (tso_enabled()) { > > + buffer_len += sizeof(struct virtio_net_hdr); > > + } > > + > > /* Assume Ethernet port. No need to set packet_type. */ > > - buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu, > > - DP_NETDEV_HEADROOM); > > + buffer = dp_packet_new_with_headroom(buffer_len, DP_NETDEV_HEADROOM); > > retval = (rx->is_tap > > - ? netdev_linux_rxq_recv_tap(rx->fd, buffer) > > - : netdev_linux_rxq_recv_sock(rx->fd, buffer)); > > + ? netdev_linux_rxq_recv_tap(rx->fd, rx->bufaux, > > rx->bufaux_len, > > + buffer) > > + : netdev_linux_rxq_recv_sock(rx->fd, rx->bufaux, > > rx->bufaux_len, > > + buffer)); > > if (retval) { > > if (retval != EAGAIN && retval != EMSGSIZE) { > > @@ -1302,7 +1389,7 @@ netdev_linux_rxq_drain(struct netdev_rxq *rxq_) > > } > > static int > > -netdev_linux_sock_batch_send(int sock, int ifindex, > > +netdev_linux_sock_batch_send(int sock, int ifindex, bool tso, int mtu, > > struct dp_packet_batch *batch) > > { > > const size_t size = dp_packet_batch_size(batch); > > @@ -1316,6 +1403,10 @@ netdev_linux_sock_batch_send(int sock, int ifindex, > > struct dp_packet *packet; > > DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > > + if (tso) { > > + netdev_linux_prepend_vnet_hdr(packet, mtu); > > + } > > + > > iov[i].iov_base = dp_packet_data(packet); > > iov[i].iov_len = dp_packet_size(packet); > > mmsg[i].msg_hdr = (struct msghdr) { .msg_name = &sll, > > @@ -1348,7 +1439,7 @@ netdev_linux_sock_batch_send(int sock, int ifindex, > > * on other interface types because we attach a socket filter to the rx > > * socket. */ > > static int > > -netdev_linux_tap_batch_send(struct netdev *netdev_, > > +netdev_linux_tap_batch_send(struct netdev *netdev_, bool tso, int mtu, > > struct dp_packet_batch *batch) > > { > > struct netdev_linux *netdev = netdev_linux_cast(netdev_); > > @@ -1365,10 +1456,15 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, > > } > > DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > > - size_t size = dp_packet_size(packet); > > + size_t size; > > ssize_t retval; > > int error; > > + if (tso) { > > + netdev_linux_prepend_vnet_hdr(packet, mtu); > > + } > > + > > + size = dp_packet_size(packet); > > do { > > retval = write(netdev->tap_fd, dp_packet_data(packet), size); > > error = retval < 0 ? errno : 0; > > @@ -1403,9 +1499,15 @@ netdev_linux_send(struct netdev *netdev_, int qid > > OVS_UNUSED, > > struct dp_packet_batch *batch, > > bool concurrent_txq OVS_UNUSED) > > { > > + bool tso = tso_enabled(); > > + int mtu = ETH_PAYLOAD_MAX; > > int error = 0; > > int sock = 0; > > + if (tso) { > > + netdev_linux_get_mtu__(netdev_linux_cast(netdev_), &mtu); > > + } > > + > > if (!is_tap_netdev(netdev_)) { > > if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) { > > error = EOPNOTSUPP; > > @@ -1424,9 +1526,9 @@ netdev_linux_send(struct netdev *netdev_, int qid > > OVS_UNUSED, > > goto free_batch; > > } > > - error = netdev_linux_sock_batch_send(sock, ifindex, batch); > > + error = netdev_linux_sock_batch_send(sock, ifindex, tso, mtu, > > batch); > > } else { > > - error = netdev_linux_tap_batch_send(netdev_, batch); > > + error = netdev_linux_tap_batch_send(netdev_, tso, mtu, batch); > > } > > if (error) { > > if (error == ENOBUFS) { > > @@ -6173,6 +6275,19 @@ af_packet_sock(void) > > close(sock); > > sock = -error; > > } > > + > > + if (tso_enabled()) { > > + int val = 1; > > + error = setsockopt(sock, SOL_PACKET, PACKET_VNET_HDR, &val, > > + sizeof val); > > + if (error) { > > + error = errno; > > + VLOG_ERR("failed to enable vnet hdr in raw socket: %s", > > + ovs_strerror(errno)); > > + close(sock); > > + sock = -error; > > + } > > + } > > } else { > > sock = -errno; > > VLOG_ERR("failed to create packet socket: %s", > > @@ -6183,3 +6298,136 @@ af_packet_sock(void) > > return sock; > > } > > + > > +static int > > +netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) > > +{ > > + struct eth_header *eth_hdr; > > + ovs_be16 eth_type; > > + int l2_len; > > + > > + eth_hdr = dp_packet_at(b, 0, ETH_HEADER_LEN); > > + if (!eth_hdr) { > > + return -EINVAL; > > + } > > + > > + l2_len = ETH_HEADER_LEN; > > + eth_type = eth_hdr->eth_type; > > + if (eth_type_vlan(eth_type)) { > > + struct vlan_header *vlan = dp_packet_at(b, l2_len, > > VLAN_HEADER_LEN); > > + > > + if (!vlan) { > > + return -EINVAL; > > + } > > + > > + eth_type = vlan->vlan_next_type; > > + l2_len += VLAN_HEADER_LEN; > > + } > > + > > + if (eth_type == htons(ETH_TYPE_IP)) { > > + struct ip_header *ip_hdr = dp_packet_at(b, l2_len, IP_HEADER_LEN); > > + > > + if (!ip_hdr) { > > + return -EINVAL; > > + } > > + > > + *l4proto = ip_hdr->ip_proto; > > + dp_packet_hwol_set_tx_ipv4(b); > > + } else if (eth_type == htons(ETH_TYPE_IPV6)) { > > + struct ovs_16aligned_ip6_hdr *nh6; > > + > > + nh6 = dp_packet_at(b, l2_len, IPV6_HEADER_LEN); > > + if (!nh6) { > > + return -EINVAL; > > + } > > + > > + *l4proto = nh6->ip6_ctlun.ip6_un1.ip6_un1_nxt; > > + dp_packet_hwol_set_tx_ipv6(b); > > + } > > + > > + return 0; > > +} > > + > > +static int > > +netdev_linux_parse_vnet_hdr(struct dp_packet *b) > > +{ > > + struct virtio_net_hdr *vnet = dp_packet_pull(b, sizeof *vnet); > > + uint16_t l4proto = 0; > > + > > + if (OVS_UNLIKELY(!vnet)) { > > + return -EINVAL; > > + } > > + > > + if (vnet->flags == 0 && vnet->gso_type == VIRTIO_NET_HDR_GSO_NONE) { > > + return 0; > > + } > > + > > + if (netdev_linux_parse_l2(b, &l4proto)) { > > + return -EINVAL; > > + } > > + > > + if (vnet->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { > > + if (l4proto == IPPROTO_TCP) { > > + dp_packet_hwol_set_csum_tcp(b); > > + } else if (l4proto == IPPROTO_UDP) { > > + dp_packet_hwol_set_csum_udp(b); > > + } else if (l4proto == IPPROTO_SCTP) { > > + dp_packet_hwol_set_csum_sctp(b); > > + } > > + } > > + > > + if (l4proto && vnet->gso_type != VIRTIO_NET_HDR_GSO_NONE) { > > + uint8_t allowed_mask = VIRTIO_NET_HDR_GSO_TCPV4 > > + | VIRTIO_NET_HDR_GSO_TCPV6 > > + | VIRTIO_NET_HDR_GSO_UDP; > > + uint8_t type = vnet->gso_type & allowed_mask; > > + > > + if (type == VIRTIO_NET_HDR_GSO_TCPV4 > > + || type == VIRTIO_NET_HDR_GSO_TCPV6) { > > + dp_packet_hwol_set_tcp_seg(b); > > + } > > + } > > + > > + return 0; > > +} > > + > > +static void > > +netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu) > > +{ > > + struct virtio_net_hdr *vnet = dp_packet_push_zeros(b, sizeof *vnet); > > + > > + if ((dp_packet_size(b) > mtu) && dp_packet_hwol_is_tso(b)) { > > + uint16_t hdr_len = ((char *)dp_packet_l4(b) - (char > > *)dp_packet_eth(b)) > > + + TCP_HEADER_LEN; > > + > > + vnet->hdr_len = (OVS_FORCE __virtio16)hdr_len; > > + vnet->gso_size = (OVS_FORCE __virtio16)(mtu - hdr_len); > > + if (dp_packet_hwol_is_ipv4(b)) { > > + vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; > > + } else { > > + vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV6; > > + } > > + > > + } else { > > + vnet->flags = VIRTIO_NET_HDR_GSO_NONE; > > + } > > + > > + if (dp_packet_hwol_l4_mask(b)) { > > + vnet->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; > > + vnet->csum_start = (OVS_FORCE __virtio16)((char *)dp_packet_l4(b) > > + - (char > > *)dp_packet_eth(b)); > > + > > + if (dp_packet_hwol_l4_is_tcp(b)) { > > + vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof( > > + struct tcp_header, tcp_csum); > > + } else if (dp_packet_hwol_l4_is_udp(b)) { > > + vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof( > > + struct udp_header, udp_csum); > > + } else if (dp_packet_hwol_l4_is_sctp(b)) { > > + vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof( > > + struct sctp_header, sctp_csum); > > + } else { > > + VLOG_WARN_RL(&rl, "Unsupported L4 protocol"); > > + } > > + } > > +} > > diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h > > index f109c4e66..87c375b47 100644 > > --- a/lib/netdev-provider.h > > +++ b/lib/netdev-provider.h > > @@ -37,6 +37,12 @@ extern "C" { > > struct netdev_tnl_build_header_params; > > #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC > > +enum netdev_ol_flags { > > + NETDEV_TX_OFFLOAD_IPV4_CKSUM = 1 << 0, > > + NETDEV_TX_OFFLOAD_TCP_CKSUM = 1 << 1, > > + NETDEV_TX_OFFLOAD_TCP_TSO = 1 << 2, > > +}; > > + > > /* A network device (e.g. an Ethernet device). > > * > > * Network device implementations may read these members but should not > > modify > > @@ -51,6 +57,10 @@ struct netdev { > > * opening this device, and therefore got assigned to the "system" > > class */ > > bool auto_classified; > > + /* This bitmask of the offloading features enabled/supported by the > > + * supported by the netdev. */ > > + uint64_t ol_flags; > > + > > /* If this is 'true', the user explicitly specified an MTU for this > > * netdev. Otherwise, Open vSwitch is allowed to override it. */ > > bool mtu_user_config; > > diff --git a/lib/netdev.c b/lib/netdev.c > > index 405c98c68..998525875 100644 > > --- a/lib/netdev.c > > +++ b/lib/netdev.c > > @@ -782,6 +782,52 @@ netdev_get_pt_mode(const struct netdev *netdev) > > : NETDEV_PT_LEGACY_L2); > > } > > +/* Check if a 'packet' is compatible with 'netdev_flags'. > > + * If a packet is incompatible, return 'false' with the 'errormsg' > > + * pointing to a reason. */ > > +static bool > > +netdev_send_prepare_packet(const uint64_t netdev_flags, > > + struct dp_packet *packet, char **errormsg) > > +{ > > + if (dp_packet_hwol_is_tso(packet) > > + && !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) { > > + /* Fall back to GSO in software. */ > > + *errormsg = "No TSO support"; > > + return false; > > + } > > + > > + if (dp_packet_hwol_l4_mask(packet) > > + && !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) { > > + /* Fall back to L4 csum in software. */ > > + *errormsg = "No L4 checksum support"; > > + return false; > > + } > > + > > + return true; > > +} > > + > > +/* Check if each packet in 'batch' is compatible with 'netdev' features, > > + * otherwise either fall back to software implementation or drop it. */ > > +static void > > +netdev_send_prepare_batch(const struct netdev *netdev, > > + struct dp_packet_batch *batch) > > +{ > > + struct dp_packet *packet; > > + size_t i, size = dp_packet_batch_size(batch); > > + > > + DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) { > > + char *errormsg = NULL; > > + > > + if (netdev_send_prepare_packet(netdev->ol_flags, packet, > > &errormsg)) { > > + dp_packet_batch_refill(batch, packet, i); > > + } else { > > + VLOG_WARN_RL(&rl, "%s: Packet dropped: %s", > > + errormsg ? errormsg : "Unsupported feature", > > + netdev_get_name(netdev)); > > + } > > + } > > +} > > + > > /* Sends 'batch' on 'netdev'. Returns 0 if successful (for every packet), > > * otherwise a positive errno value. Returns EAGAIN without blocking if > > * at least one the packets cannot be queued immediately. Returns > > EMSGSIZE > > @@ -811,8 +857,10 @@ int > > netdev_send(struct netdev *netdev, int qid, struct dp_packet_batch *batch, > > bool concurrent_txq) > > { > > - int error = netdev->netdev_class->send(netdev, qid, batch, > > - concurrent_txq); > > + int error; > > + > > + netdev_send_prepare_batch(netdev, batch); > > + error = netdev->netdev_class->send(netdev, qid, batch, concurrent_txq); > > if (!error) { > > COVERAGE_INC(netdev_sent); > > } > > @@ -878,9 +926,17 @@ netdev_push_header(const struct netdev *netdev, > > const struct ovs_action_push_tnl *data) > > { > > struct dp_packet *packet; > > - DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { > > - netdev->netdev_class->push_header(netdev, packet, data); > > - pkt_metadata_init(&packet->md, data->out_port); > > + size_t i, size = dp_packet_batch_size(batch); > > + > > + DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) { > > + if (!dp_packet_hwol_is_tso(packet)) { > > + netdev->netdev_class->push_header(netdev, packet, data); > > + pkt_metadata_init(&packet->md, data->out_port); > > + dp_packet_batch_refill(batch, packet, i); > > + } else { > > + VLOG_WARN_RL(&rl, "%s: Tunneling of TSO packet is not > > supported: " > > + "packet dropped", netdev_get_name(netdev)); > > + } > > } > > return 0; > > diff --git a/lib/tso.c b/lib/tso.c > > new file mode 100644 > > index 000000000..9dc15e146 > > --- /dev/null > > +++ b/lib/tso.c > > @@ -0,0 +1,54 @@ > > +/* > > + * Copyright (c) 2020 Red Hat, Inc. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > + * See the License for the specific language governing permissions and > > + * limitations under the License. > > + */ > > + > > +#include <config.h> > > + > > +#include "smap.h" > > +#include "ovs-thread.h" > > +#include "openvswitch/vlog.h" > > +#include "dpdk.h" > > +#include "tso.h" > > +#include "vswitch-idl.h" > > + > > +VLOG_DEFINE_THIS_MODULE(tso); > > + > > +static bool tso_support_enabled = false; > > + > > +void > > +tso_init(const struct smap *ovs_other_config) > > +{ > > + if (smap_get_bool(ovs_other_config, "tso-support", false)) { > > + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; > > + > > + if (ovsthread_once_start(&once)) { > > + if (dpdk_available()) { > > + VLOG_INFO("TCP Segmentation Offloading (TSO) support > > enabled"); > > + tso_support_enabled = true; > > + } else { > > + VLOG_ERR("TCP Segmentation Offloading (TSO) is unsupported > > " > > + "without enabling DPDK"); > > + tso_support_enabled = false; > > + } > > + ovsthread_once_done(&once); > > + } > > + } > > +} > > + > > +bool > > +tso_enabled(void) > > +{ > > + return tso_support_enabled; > > +} > > diff --git a/lib/tso.h b/lib/tso.h > > new file mode 100644 > > index 000000000..6594496ac > > --- /dev/null > > +++ b/lib/tso.h > > @@ -0,0 +1,23 @@ > > +/* > > + * Copyright (c) 2020 Red Hat Inc. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > + * See the License for the specific language governing permissions and > > + * limitations under the License. > > + */ > > + > > +#ifndef TSO_H > > +#define TSO_H 1 > > + > > +void tso_init(const struct smap *ovs_other_config); > > +bool tso_enabled(void); > > + > > +#endif /* tso.h */ > > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > > index 86c7b10a9..6d73922f6 100644 > > --- a/vswitchd/bridge.c > > +++ b/vswitchd/bridge.c > > @@ -65,6 +65,7 @@ > > #include "system-stats.h" > > #include "timeval.h" > > #include "tnl-ports.h" > > +#include "tso.h" > > #include "util.h" > > #include "unixctl.h" > > #include "lib/vswitch-idl.h" > > @@ -3285,6 +3286,7 @@ bridge_run(void) > > if (cfg) { > > netdev_set_flow_api_enabled(&cfg->other_config); > > dpdk_init(&cfg->other_config); > > + tso_init(&cfg->other_config); > > } > > /* Initialize the ofproto library. This only needs to run once, but > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > > index 0ec726c39..354dcabfa 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -690,6 +690,18 @@ > > once in few hours or a day or a week. > > </p> > > </column> > > + <column name="other_config" key="tso-support" > > + type='{"type": "boolean"}'> > > + <p> > > + Set this value to <code>true</code> to enable support for TSO > > (TCP > > + Segmentation Offloading). When TSO is enabled, vhost-user client > > + interfaces can transmit packets up to 64KB. > > + </p> > > + <p> > > + The default value is <code>false</code>. Changing this value > > requires > > + restarting the daemon. > > + </p> > > + </column> > > </group> > > <group title="Status"> > > <column name="next_cfg"> > > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- fbl _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev