On Thu, 2016-10-27 at 12:54 -0500, Thomas Falcon wrote:
> On 10/27/2016 10:26 AM, Eric Dumazet wrote:
> > On Wed, 2016-10-26 at 11:09 +1100, Jon Maxwell wrote:
> >> We recently encountered a bug where a few customers using ibmveth on the 
> >> same LPAR hit an issue where a TCP session hung when large receive was
> >> enabled. Closer analysis revealed that the session was stuck because the 
> >> one side was advertising a zero window repeatedly.
> >>
> >> We narrowed this down to the fact the ibmveth driver did not set gso_size 
> >> which is translated by TCP into the MSS later up the stack. The MSS is 
> >> used to calculate the TCP window size and as that was abnormally large, 
> >> it was calculating a zero window, even although the sockets receive buffer 
> >> was completely empty. 
> >>
> >> We were able to reproduce this and worked with IBM to fix this. Thanks Tom 
> >> and Marcelo for all your help and review on this.
> >>
> >> The patch fixes both our internal reproduction tests and our customers 
> >> tests.
> >>
> >> Signed-off-by: Jon Maxwell <jmaxwel...@gmail.com>
> >> ---
> >>  drivers/net/ethernet/ibm/ibmveth.c | 20 ++++++++++++++++++++
> >>  1 file changed, 20 insertions(+)
> >>
> >> diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
> >> b/drivers/net/ethernet/ibm/ibmveth.c
> >> index 29c05d0..c51717e 100644
> >> --- a/drivers/net/ethernet/ibm/ibmveth.c
> >> +++ b/drivers/net/ethernet/ibm/ibmveth.c
> >> @@ -1182,6 +1182,8 @@ static int ibmveth_poll(struct napi_struct *napi, 
> >> int budget)
> >>    int frames_processed = 0;
> >>    unsigned long lpar_rc;
> >>    struct iphdr *iph;
> >> +  bool large_packet = 0;
> >> +  u16 hdr_len = ETH_HLEN + sizeof(struct tcphdr);
> >>  
> >>  restart_poll:
> >>    while (frames_processed < budget) {
> >> @@ -1236,10 +1238,28 @@ static int ibmveth_poll(struct napi_struct *napi, 
> >> int budget)
> >>                                            iph->check = 0;
> >>                                            iph->check = 
> >> ip_fast_csum((unsigned char *)iph, iph->ihl);
> >>                                            adapter->rx_large_packets++;
> >> +                                          large_packet = 1;
> >>                                    }
> >>                            }
> >>                    }
> >>  
> >> +                  if (skb->len > netdev->mtu) {
> >> +                          iph = (struct iphdr *)skb->data;
> >> +                          if (be16_to_cpu(skb->protocol) == ETH_P_IP &&
> >> +                              iph->protocol == IPPROTO_TCP) {
> >> +                                  hdr_len += sizeof(struct iphdr);
> >> +                                  skb_shinfo(skb)->gso_type = 
> >> SKB_GSO_TCPV4;
> >> +                                  skb_shinfo(skb)->gso_size = netdev->mtu 
> >> - hdr_len;
> >> +                          } else if (be16_to_cpu(skb->protocol) == 
> >> ETH_P_IPV6 &&
> >> +                                     iph->protocol == IPPROTO_TCP) {
> >> +                                  hdr_len += sizeof(struct ipv6hdr);
> >> +                                  skb_shinfo(skb)->gso_type = 
> >> SKB_GSO_TCPV6;
> >> +                                  skb_shinfo(skb)->gso_size = netdev->mtu 
> >> - hdr_len;
> >> +                          }
> >> +                          if (!large_packet)
> >> +                                  adapter->rx_large_packets++;
> >> +                  }
> >> +
> >>  
> > This might break forwarding and PMTU discovery.
> >
> > You force gso_size to device mtu, regardless of real MSS used by the TCP
> > sender.
> >
> > Don't you have the MSS provided in RX descriptor, instead of guessing
> > the value ?
> >
> >
> >
> The MSS is not always available unfortunately, so this is the best solution 
> there is at the moment. 

Hmm... then what about skb_shinfo(skb)->gso_segs ?

ip_rcv() for example has :

        __IP_ADD_STATS(net,
                       IPSTATS_MIB_NOECTPKTS + (iph->tos & INET_ECN_MASK),
                       max_t(unsigned short, 1, skb_shinfo(skb)->gso_segs));



Also prefer : (skb->protocol == htons(ETH_P_IP)) tests

And the ipv6 test is wrong :

} else if (be16_to_cpu(skb->protocol) == ETH_P_IPV6 &&
           iph->protocol == IPPROTO_TCP) {


Since iph is a pointer to ipv4 iphdr .



Reply via email to