On Thu, 2016-10-27 at 12:54 -0500, Thomas Falcon wrote: > On 10/27/2016 10:26 AM, Eric Dumazet wrote: > > On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote: > >> We recently encountered a bug where a few customers using ibmveth on the > >> same LPAR hit an issue where a TCP session hung when large receive was > >> enabled. Closer analysis revealed that the session was stuck because the > >> one side was advertising a zero window repeatedly. > >> > >> We narrowed this down to the fact the ibmveth driver did not set gso_size > >> which is translated by TCP into the MSS later up the stack. The MSS is > >> used to calculate the TCP window size and as that was abnormally large, > >> it was calculating a zero window, even although the sockets receive buffer > >> was completely empty. > >> > >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom > >> and Marcelo for all your help and review on this. > >> > >> The patch fixes both our internal reproduction tests and our customers > >> tests. > >> > >> Signed-off-by: Jon Maxwell <jmaxwel...@gmail.com> > >> --- > >> drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++ > >> 1 file changed, 20 insertions(+) > >> > >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c > >> b/drivers/net/ethernet/ibm/ibmveth.c > >> index 29c05d0..c51717e 100644 > >> --- a/drivers/net/ethernet/ibm/ibmveth.c > >> +++ b/drivers/net/ethernet/ibm/ibmveth.c > >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, > >> int budget) > >> int frames_processed = 0; > >> unsigned long lpar_rc; > >> struct iphdr *iph; > >> + bool large_packet = 0; > >> + u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr); > >> > >> restart_poll: > >> while (frames_processed < budget) { > >> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, > >> int budget) > >> iph->check = 0; > >> iph->check = > >> ip_fast_csum((unsigned char *)iph, iph->ihl); > >> adapter->rx_large_packets++; > >> + large_packet = 1; > >> } > >> } > >> } > >> > >> + if (skb->len > netdev->mtu) { > >> + iph = (struct iphdr *)skb->data; > >> + if (be16_to_cpu(skb->protocol) == ETH_P_IP && > >> + iph->protocol == IPPROTO_TCP) { > >> + hdr_len += sizeof(struct iphdr); > >> + skb_shinfo(skb)->gso_type = > >> SKB_GSO_TCPV4; > >> + skb_shinfo(skb)->gso_size = netdev->mtu > >> - hdr_len; > >> + } else if (be16_to_cpu(skb->protocol) == > >> ETH_P_IPV6 && > >> + iph->protocol == IPPROTO_TCP) { > >> + hdr_len += sizeof(struct ipv6hdr); > >> + skb_shinfo(skb)->gso_type = > >> SKB_GSO_TCPV6; > >> + skb_shinfo(skb)->gso_size = netdev->mtu > >> - hdr_len; > >> + } > >> + if (!large_packet) > >> + adapter->rx_large_packets++; > >> + } > >> + > >> > > This might break forwarding and PMTU discovery. > > > > You force gso_size to device mtu, regardless of real MSS used by the TCP > > sender. > > > > Don't you have the MSS provided in RX descriptor, instead of guessing > > the value ? > > > > > > > The MSS is not always available unfortunately, so this is the best solution > there is at the moment.
Hmm... then what about skb_shinfo(skb)->gso_segs ? ip_rcv() for example has : __IP_ADD_STATS(net, IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK), max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs)); Also prefer : (skb->protocol == htons(ETH_P_IP)) tests And the ipv6 test is wrong : } else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 && iph->protocol == IPPROTO_TCP) { Since iph is a pointer to ipv4 iphdr .