Re: [PATCH] ibmveth: Support to enable LSO/CSO for Trunk VEA.
Re-sending as my earlier response had some HTML subparts. Let me give some background before I answer your queries. In IBM PowerVM environment, ibmveth driver supports largesend and checksum offload today, but only for virtual ethernet adapters (VEA) which are not configured in "Trunk mode". In trunk mode, one cannot enable checksum and largesend offload capabilities. Without these offloads enabled, the performance numbers are not good. This patch is to enable these offloads for "Trunk" VEAs. The following shows a typical configuration for network packet flow, when VMs in the PowerVM server have their network virtualized and communicate to external world. VM (ibmveth) <=> PowerVM Hypervisor <=> PowerVM I/O Server VM ( ibmveth in "Trunk mode" <=> OVS <=> Physical NIC ) <=> External Network As you can see the packets originating in VM will travel through local ibmveth driver and then to PowerVM Hypervisor, then it gets delivered to ibmveth driver configured in "Trunk" mode in I/O Server, which is then bridged by OVS to external network via Physical NIC. To have largesend and checksum offload enabled end to end, from VM up to Physical NIC, ibmveth needs to support these offload capabilities when configured in "Trunk" mode too. Before this patch, when a VM communicates with external network (in a configuration similar to above), throughput numbers were not so good (~1.5 Gbps) and with the patch, I see ~9.4 Gbps throughput for a 10G NIC (iperf used for measurements). On 4/9/2017 12:15 AM, David Miller wrote: From: Sivakumar Krishnasamy Date: Fri, 7 Apr 2017 05:57:59 -0400 Enable largesend and checksum offload for ibmveth configured in trunk mode. Added support to SKB frag_list in TX path by skb_linearize'ing such SKBs. Signed-off-by: Sivakumar Krishnasamy Why is linearization necessary? It would seem that the gains you get from GRO are nullified by linearizing the SKB and thus copying all the data around and allocating buffers. When Physical NIC has GRO enabled and when OVS bridges these packets, OVS vport send code will end up calling dev_queue_xmit, which in turn calls validate_xmit_skb. validate_xmit_skb has the below code snippet, if (netif_needs_gso(skb, features)) { struct sk_buff *segs; segs = skb_gso_segment(skb, features); <=== Segments the GSO packet into MTU sized segments. When the OVS outbound vport is ibmveth, netif_needs_gso returns positively if the SKB has a frag_list and if the driver doesn't support the same (NETIF_F_FRAGLIST feature). So all the packets received by ibmveth are of MSS size (or lesser) due to the above code. On a 10G physical NIC, the maximum throughput achieved was 2.2 Gbps due to the above segmentation in validate_xmit_skb. With the patch to linearize the SKB, the throughput increased to 9 Gbps (and ibmveth received packets without being segmented). This is ~4X improvement even though we end up allocating buffers and copying data. Finally, all of that new checksumming stuff looks extremely suspicious. You have to explain why that is happening and why it isn't because this driver is doing something incorrectly. Thanks. We are now enabling support for OVS and improving bridging performance in IBM's PowerVM environment, which brings in these new offload requirements for ibmveth driver configured in Trunk mode. Please let me know if you need more details. Regards, Siva K
Re: [PATCH] ibmveth: Support to enable LSO/CSO for Trunk VEA.
From: Sivakumar Krishnasamy Date: Fri, 7 Apr 2017 05:57:59 -0400 > Enable largesend and checksum offload for ibmveth configured in trunk mode. > Added support to SKB frag_list in TX path by skb_linearize'ing such SKBs. > > Signed-off-by: Sivakumar Krishnasamy Why is linearization necessary? It would seem that the gains you get from GRO are nullified by linearizing the SKB and thus copying all the data around and allocating buffers. Finally, all of that new checksumming stuff looks extremely suspicious. You have to explain why that is happening and why it isn't because this driver is doing something incorrectly. Thanks.
[PATCH] ibmveth: Support to enable LSO/CSO for Trunk VEA.
Enable largesend and checksum offload for ibmveth configured in trunk mode. Added support to SKB frag_list in TX path by skb_linearize'ing such SKBs. Signed-off-by: Sivakumar Krishnasamy --- drivers/net/ethernet/ibm/ibmveth.c | 102 ++--- drivers/net/ethernet/ibm/ibmveth.h | 1 + 2 files changed, 85 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 72ab7b6..e1e238d 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -46,6 +46,8 @@ #include #include #include +#include +#include #include "ibmveth.h" @@ -808,8 +810,7 @@ static int ibmveth_set_csum_offload(struct net_device *dev, u32 data) ret = h_illan_attributes(adapter->vdev->unit_address, 0, 0, &ret_attr); - if (ret == H_SUCCESS && !(ret_attr & IBMVETH_ILLAN_ACTIVE_TRUNK) && - !(ret_attr & IBMVETH_ILLAN_TRUNK_PRI_MASK) && + if (ret == H_SUCCESS && (ret_attr & IBMVETH_ILLAN_PADDED_PKT_CSUM)) { ret4 = h_illan_attributes(adapter->vdev->unit_address, clr_attr, set_attr, &ret_attr); @@ -1040,6 +1041,15 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, dma_addr_t dma_addr; unsigned long mss = 0; + /* veth doesn't handle frag_list, so linearize the skb. +* When GRO is enabled SKB's can have frag_list. +*/ + if (adapter->is_active_trunk && + skb_has_frag_list(skb) && __skb_linearize(skb)) { + netdev->stats.tx_dropped++; + goto out; + } + /* * veth handles a maximum of 6 segments including the header, so * we have to linearize the skb if there are more than this. @@ -1064,9 +1074,6 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, desc_flags = IBMVETH_BUF_VALID; - if (skb_is_gso(skb) && adapter->fw_large_send_support) - desc_flags |= IBMVETH_BUF_LRG_SND; - if (skb->ip_summed == CHECKSUM_PARTIAL) { unsigned char *buf = skb_transport_header(skb) + skb->csum_offset; @@ -1076,6 +1083,9 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb, /* Need to zero out the checksum */ buf[0] = 0; buf[1] = 0; + + if (skb_is_gso(skb) && adapter->fw_large_send_support) + desc_flags |= IBMVETH_BUF_LRG_SND; } retry_bounce: @@ -1128,7 +1138,7 @@ retry_bounce: descs[i+1].fields.address = dma_addr; } - if (skb_is_gso(skb)) { + if (skb->ip_summed == CHECKSUM_PARTIAL && skb_is_gso(skb)) { if (adapter->fw_large_send_support) { mss = (unsigned long)skb_shinfo(skb)->gso_size; adapter->tx_large_packets++; @@ -1232,6 +1242,66 @@ static void ibmveth_rx_mss_helper(struct sk_buff *skb, u16 mss, int lrg_pkt) } } +static void ibmveth_rx_csum_helper(struct sk_buff *skb, + struct ibmveth_adapter *adapter) +{ + struct iphdr *iph = NULL; + struct ipv6hdr *iph6 = NULL; + __be16 skb_proto = 0; + u16 iphlen = 0; + u16 iph_proto = 0; + u16 tcphdrlen = 0; + + skb_proto = be16_to_cpu(skb->protocol); + + if (skb_proto == ETH_P_IP) { + iph = (struct iphdr *)skb->data; + + /* If the IP checksum is not offloaded and if the packet +* is large send, the checksum must be rebuilt. +*/ + if (iph->check == 0x) { + iph->check = 0; + iph->check = ip_fast_csum((unsigned char *)iph, + iph->ihl); + } + + iphlen = iph->ihl * 4; + iph_proto = iph->protocol; + } else if (skb_proto == ETH_P_IPV6) { + iph6 = (struct ipv6hdr *)skb->data; + iphlen = sizeof(struct ipv6hdr); + iph_proto = iph6->nexthdr; + } + + /* In OVS environment, when a flow is not cached, specifically for a +* new TCP connection, the first (SYN) packet information is passed up +* the user space for finding a flow. During this process, OVS computes +* checksum on the packet when CHECKSUM_PARTIAL flag is set. +* Given that we zeroed out TCP checksum field in transmit path as we +* set "no checksum bit", OVS computed checksum will be incorrect w/o +* TCP pseudo checksum in the packet. +* So, re-compute TCP pseudo header checksum. +*/ + if (iph_proto == IPPROTO_TCP && adapter->is_active_trunk) { + struct tcphdr *tcph = (struct tcphdr *)(skb->data + iphlen); + + tcphdrlen = skb->len - iphlen; + +