TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP network stack to delegate segmentation of a TCP segment to the NIC, thus saving compute resources.
This RFC proposes to add support for TSO to the MLX4 PMD. Prerequisites: In order for the PMD to recognize the TSO capabilities of the device one has to use: * RDMA-core v18.0 or above. * Linux kernel 4.16 or above. Assumptions: * mlx4 PMD will follow the TSO support implemented in mlx5 PMD. * PMD is backwards compatible. ** The PMD will continue work with the kernels and RDMA-core supported by it today. ** The PMD will continue to work with devices not supporting TSO. Changes proposed in the PMD for implementing TSO: * At init, query the device for TSO support and MAX segment size being supported. This will also determine if the PMD will advertise support for TSO (dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;) * Calling create-qp when creating a Tx queue will have to consider the MAX TSO header size when calculating the actual queue buffer size. This may be abstracted by calling ibv_create_qp_ex with IBV_QP_INIT_ATTR_MAX_TSO_HEADER as comp flag rather than ibv_create_qp. If this breaks backwards compatibility then this calculation will be done in the PMD code. * Modify tx_burst function to: ** Check for TSO flag indication in the packets of the packet burst (buf->ol_flags & PKT_TX_TCP_SEG). ** For TSO packet create the WQE appropriate for sending a TSO packet and fill it with packet info and L2/L3/L4 Headers. * Modify Tx completion function to handle releasing of TSO packet buffers that were transmitted. Concerns: * Impact of changing Tx send routine on performance. The performance of the tx_burst routine for non-TSO packets may be affected just by placing the code that handles TSO packets in it, so we may want to consider having a dedicated routine for TSO packets. * No MAX-TSO parameter. This is a cross-PMD issue that may need a separate mailing thread to handle. As for today there is no way for the PMD to advertise the MAX-TSO it or its HW support as done with other capabilities. (The indirection table size for example. see rte_eth_dev_info.reta_size in rte_ethdev.h). Also there is no DPDK parameter or constant value that the PMD can use in order to know the MAX-TSO the system requires. This prevents applications from determining the MAX-TSO that can be used leading to configuration mismatches that may lead to transmit failures or to less-than-optimize TSO configuration in the best case. I propose to add a max_tso field in rte_eth_dev_info that will allow the PMD to advertise the max tso is supports. This can be used by DPDK applications to determine what TSO size to use. If this is a major change that cannot fit the 18.08 schedule then I propose to add a MAX_TSO constant in rte_ethdev.h, The PMD will compare this value whit its own MAX-TSO and if it cannot meet the defined value it will not advertise that it is a TSO capable device. * Handling packets longer then MAX-TSO In case a PMD is requested to send a TSO packet which is longer than MAX-TSO the PMD send routine should return with an error. A different approach that can be used on the future is to apply GSO to those packets using the GSO lib in DPDK. I am interested in general design comments and concerns listed above. Signed-off-by: Moti Haimovsky <mo...@mellanox.com> -- 1.8.3.1