Our support team has encountered a case where ibmveth + openvswitch + bnx2x has
lead to some issues, which IBM should probably be aware of before
turning on large segments in more places.

Here's a summary from support for that issue:

==========

[Issue: we see a firmware assertion from an IBM branded bnx2x card.
Decoding the dump with the help of upstream shows that the assert is
caused by a packet with GSO on and gso_size > ~9700 bytes being passed
to the card. We traced the packets through the system, and came up
with this root cause. The system uses ibmveth to talk to AIX LPARs, a
bnx2x network card to talk to the world, and Open vSwitch to tie them
together. There is no VIOS involvement - the card is attached to the
Linux partition.]

The packets causing the issue come through the ibmveth interface -
from the AIX LPAR. The veth protocol is 'special' - communication
between LPARs on the same chassis can use very large (64k) frames to
reduce overhead. Normal networks cannot handle such large packets, so
traditionally, the VIOS partition would signal to the AIX partitions
that it was 'special', and AIX would send regular, ethernet-sized
packets to VIOS, which VIOS would then send out.

This signalling between VIOS and AIX is done in a way that is not
standards-compliant, and so was never made part of Linux. Instead, the
Linux driver has always understood large frames and passed them up the
network stack.

In some cases (e.g. with TCP), multiple TCP segments are coalesced
into one large packet. In Linux, this goes through the generic receive
offload code, using a similar mechanism to GSO. These segments can be
very large which presents as a very large MSS (maximum segment size)
or gso_size.

Normally, the large packet is simply passed to whatever network
application on Linux is going to consume it, and everything is OK.

However, in this case, the packets go through Open vSwitch, and are
then passed to the bnx2x driver. The bnx2x driver/hardware supports
TSO and GSO, but with a restriction: the maximum segment size is
limited to around 9700 bytes. Normally this is more than adequate as
jumbo frames are limited to 9000 bytes. However, if a large packet
with large (>9700 byte) TCP segments arrives through ibmveth, and is
passed to bnx2x, the hardware will panic.

Turning off TSO prevents the crash as the kernel resegments the data
and assembles the packets in software. This has a performance cost.

Clearly at the very least, bnx2x should not crash in this case, and I
am working towards a patch for that.

However, this still leaves us with some issues. The only thing the
bnx2x driver can sensibly do is drop the packet, which will prevent
the crash. However, there will still be issues with large packets:
when they are dropped, the other side will eventually realise that the
data is missing and ask for a retransmit, but the retransmit might
also be too big - there's no way of signalling back to the AIX LPAR
that it should reduce the MSS. Even if the data eventually gets
through there will be a latency/throughput/performance hit.

==========

Seeing as IBM seems to be in active development in this area - indeed
this code explicitly deals with ibmveth + ovs, could some one from IBM review 
this?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1692538

Title:
  Ubuntu 16.04.02: ibmveth: Support to enable LSO/CSO for Trunk VEA

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Zesty:
  Fix Released
Status in linux source package in Artful:
  Fix Released

Bug description:
  
  == SRU Justification ==
  Commit 66aa0678ef is request to fix four issues with the ibmveth driver.
  The issues are as follows:
  - Issue 1: ibmveth doesn't support largesend and checksum offload features 
when configured as "Trunk".
  - Issue 2: SYN packet drops seen at destination VM. When the packet
  originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to IO
  server's inbound Trunk ibmveth, on validating "checksum good" bits in ibmveth
  receive routine, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY flag.
  - Issue 3: First packet of a TCP connection will be dropped, if there is
  no OVS flow cached in datapath.
  - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list.

  The details for the fixes to these issues are described in the commits
  git log.



  == Comment: #0 - BRYANT G. LY <b...@us.ibm.com> - 2017-05-22 08:40:16 ==
  ---Problem Description---

   - Issue 1: ibmveth doesn't support largesend and checksum offload features
     when configured as "Trunk". Driver has explicit checks to prevent
     enabling these offloads.

   - Issue 2: SYN packet drops seen at destination VM. When the packet
     originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to
     IO server's inbound Trunk ibmveth, on validating "checksum good" bits
     in ibmveth receive routine, SKB's ip_summed field is set with
     CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux
     Bridge) and delivered to outbound Trunk ibmveth. At this point the
     outbound ibmveth transmit routine will not set "no checksum" and
     "checksum good" bits in transmit buffer descriptor, as it does so only
     when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets
     delivered to destination VM, TCP layer receives the packet with checksum
     value of 0 and with no checksum related flags in ip_summed field. This
     leads to packet drops. So, TCP connections never goes through fine.

   - Issue 3: First packet of a TCP connection will be dropped, if there is
     no OVS flow cached in datapath. OVS while trying to identify the flow,
     computes the checksum. The computed checksum will be invalid at the
     receiving end, as ibmveth transmit routine zeroes out the pseudo
     checksum value in the packet. This leads to packet drop.

   - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list.
     When Physical NIC has GRO enabled and when OVS bridges these packets,
     OVS vport send code will end up calling dev_queue_xmit, which in turn
     calls validate_xmit_skb.
     In validate_xmit_skb routine, the larger packets will get segmented into
     MSS sized segments, if SKB has a frag_list and if the driver to which
     they are delivered to doesn't support NETIF_F_FRAGLIST feature.

  Contact Information = Bryant G. Ly/b...@us.ibm.com

  ---uname output---
  4.8.0-51.54

  Machine Type = p8

  ---Debugger---
  A debugger is not configured

  ---Steps to Reproduce---
   Increases performance greatly

  The patch has been accepted upstream:
  https://patchwork.ozlabs.org/patch/764533/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1692538/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to