On Sat, Sep 2, 2017 at 9:11 PM, Saeed Mahameed <sae...@dev.mellanox.co.il> wrote: > On Sat, Sep 2, 2017 at 6:37 PM, Tom Herbert <t...@herbertland.com> wrote: >> On Sat, Sep 2, 2017 at 6:32 PM, Hannes Frederic Sowa >> <han...@stressinduktion.org> wrote: >>> Hi Saeed, >>> >>> On Sun, Sep 3, 2017, at 01:01, Saeed Mahameed wrote: >>>> On Thu, Aug 31, 2017 at 6:51 AM, Hannes Frederic Sowa >>>> <han...@stressinduktion.org> wrote: >>>> > Saeed Mahameed <sae...@mellanox.com> writes: >>>> > >>>> >> The first patch from Gal and Ariel provides the mlx5 driver support for >>>> >> ConnectX capability to perform IP version identification and matching in >>>> >> order to distinguish between IPv4 and IPv6 without the need to specify >>>> >> the >>>> >> encapsulation type, thus perform RSS in MPLS automatically without >>>> >> specifying MPLS ethertyoe. This patch will also serve for inner GRE >>>> >> IPv4/6 >>>> >> classification for inner GRE RSS. >>>> > >>>> > I don't think this is legal at all or did I misunderstood something? >>>> > >>>> > <https://tools.ietf.org/html/rfc3032#section-2.2> >>>> >>>> It seems you misunderstood the cover letter. The HW will still >>>> identify MPLS (IPv4/IPv6) packets using a new bit we specify in the HW >>>> steering rules rather than adding new specific rules with {MPLS >>>> ethertype} X {IPv4,IPv6} to classify MPLS IPv{4,6} traffic, Same >>>> functionality a better and general way to approach it. >>>> Bottom line the hardware is capable of processing MPLS headers and >>>> perform RSS on the inner packet (IPv4/6) without the need of the >>>> driver to provide precise steering MPLS rules. >>> >>> Sorry, I think I am still confused. >>> >>> I just want to make sure that you don't use the first nibble after the >>> mpls bottom of stack label in any way as an indicator if that is an IPv4 >>> or IPv6 packet by default. It can be anything. The forward equivalence >>> class tells the stack which protocol you see. >>> >>> If you match on the first nibble behind the MPLS bottom of stack label >>> the '4' or '6' respectively could be part of a MAC address with its >>> first nibble being 4 or 6, because the particular pseudowire is EoMPLS >>> and uses no control world. >>> >>> I wanted to mention it, because with addition of e.g. VPLS this could >>> cause problems down the road and should at least be controllable? It is >>> probably better to use Entropy Labels in future. >>> >> Or just use IPv6 with flow label for RSS (or MPLS/UDP, GRE/UDP if you >> prefer) then all this protocol specific DPI for RSS just goes away ;-) > > Hi Tom, > > How does MPLS/UDP or GRE/UDP RSS works without protocol specific DPI ? > unlike vxlan those protocols are not over UDP and you can't just play > with the outer header udp src port, or do you ? > > Can you elaborate ? > Hi Saeed,
An encapsulator sets the UDP source port to be the flow entropy of the packet being encapsulated. So when the packet traverses the network devices can base their hash just on the canonical 5-tuple which is sufficient for ECMP and RSS. IPv6 flow label is even better since the middleboxes don't even need to look at the transport header, a packet is steered based on the 3-tuple of addresses and flow label. In the Linux stack, udp_flow_src_port is used by UDP encapsulations to set the source port. Flow label is similarly set by ip6_make_flowlabel. Both of these functions use the skb->hash which is computed by calling flow dissector at most once per packet (if the packet was received with an L4 HW hash or locally originated on a connection the hash does not need to be computed). Please look at https://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf as well as Davem's "Less is More" presentation which highlights the virtues of protocol generic HW mechanisms (https://www.youtube.com/watch?v=6VgmazGwL_Y). Tom