On Sun, Sep 3, 2017 at 6:43 PM, Tom Herbert <t...@herbertland.com> wrote: > On Sat, Sep 2, 2017 at 9:11 PM, Saeed Mahameed <sae...@dev.mellanox.co.il> > wrote: >> On Sat, Sep 2, 2017 at 6:37 PM, Tom Herbert <t...@herbertland.com> wrote: >>> On Sat, Sep 2, 2017 at 6:32 PM, Hannes Frederic Sowa >>> <han...@stressinduktion.org> wrote: >>>> Hi Saeed, >>>> >>>> On Sun, Sep 3, 2017, at 01:01, Saeed Mahameed wrote: >>>>> On Thu, Aug 31, 2017 at 6:51 AM, Hannes Frederic Sowa >>>>> <han...@stressinduktion.org> wrote: >>>>> > Saeed Mahameed <sae...@mellanox.com> writes: >>>>> > >>>>> >> The first patch from Gal and Ariel provides the mlx5 driver support for >>>>> >> ConnectX capability to perform IP version identification and matching >>>>> >> in >>>>> >> order to distinguish between IPv4 and IPv6 without the need to specify >>>>> >> the >>>>> >> encapsulation type, thus perform RSS in MPLS automatically without >>>>> >> specifying MPLS ethertyoe. This patch will also serve for inner GRE >>>>> >> IPv4/6 >>>>> >> classification for inner GRE RSS. >>>>> > >>>>> > I don't think this is legal at all or did I misunderstood something? >>>>> > >>>>> > <https://tools.ietf.org/html/rfc3032#section-2.2> >>>>> >>>>> It seems you misunderstood the cover letter. The HW will still >>>>> identify MPLS (IPv4/IPv6) packets using a new bit we specify in the HW >>>>> steering rules rather than adding new specific rules with {MPLS >>>>> ethertype} X {IPv4,IPv6} to classify MPLS IPv{4,6} traffic, Same >>>>> functionality a better and general way to approach it. >>>>> Bottom line the hardware is capable of processing MPLS headers and >>>>> perform RSS on the inner packet (IPv4/6) without the need of the >>>>> driver to provide precise steering MPLS rules. >>>> >>>> Sorry, I think I am still confused. >>>> >>>> I just want to make sure that you don't use the first nibble after the >>>> mpls bottom of stack label in any way as an indicator if that is an IPv4 >>>> or IPv6 packet by default. It can be anything. The forward equivalence >>>> class tells the stack which protocol you see. >>>> >>>> If you match on the first nibble behind the MPLS bottom of stack label >>>> the '4' or '6' respectively could be part of a MAC address with its >>>> first nibble being 4 or 6, because the particular pseudowire is EoMPLS >>>> and uses no control world. >>>> >>>> I wanted to mention it, because with addition of e.g. VPLS this could >>>> cause problems down the road and should at least be controllable? It is >>>> probably better to use Entropy Labels in future. >>>> >>> Or just use IPv6 with flow label for RSS (or MPLS/UDP, GRE/UDP if you >>> prefer) then all this protocol specific DPI for RSS just goes away ;-)
>> How does MPLS/UDP or GRE/UDP RSS works without protocol specific DPI ? >> unlike vxlan those protocols are not over UDP and you can't just play >> with the outer header udp src port, or do you ? >> Can you elaborate ? > An encapsulator sets the UDP source port to be the flow entropy of the > packet being encapsulated. So when the packet traverses the network > devices can base their hash just on the canonical 5-tuple which is > sufficient for ECMP and RSS. IPv6 flow label is even better since the > middleboxes don't even need to look at the transport header, a packet > is steered based on the 3-tuple of addresses and flow label. In the > Linux stack, udp_flow_src_port is used by UDP encapsulations to set > the source port. Flow label is similarly set by ip6_make_flowlabel. > Both of these functions use the skb->hash which is computed by calling > flow dissector at most once per packet (if the packet was received > with an L4 HW hash or locally originated on a connection the hash does > not need to be computed). Hi Tom, Re all sorts of udp encap, sure, we're all on the less-is-more thing and just RSS-ing on the ip+udp encap header. For GRE, I was trying to fight back that rss-ing on inner, but as Saeed commented, we didn't see something simple through which the HW can do spreading. To make sure I follow, you are saying that if this is gre6 tunneling net-next.git]# git grep -p ip6_make_flowlabel net/ include/linux/ include/net/ include/net/ipv6.h=static inline void iph_to_flow_copy_v6addrs(struct flow_keys *flow, include/net/ipv6.h:static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb, include/net/ipv6.h=static inline void ip6_set_txhash(struct sock *sk) { } include/net/ipv6.h:static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb, net/ipv6/ip6_gre.c=static int ip6gre_header(struct sk_buff *skb, struct net_device *dev, net/ipv6/ip6_gre.c: ip6_make_flowlabel(dev_net(dev), skb, net/ipv6/ip6_output.c=int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6, net/ipv6/ip6_output.c: ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel, net/ipv6/ip6_output.c=struct sk_buff *__ip6_make_skb(struct sock *sk, net/ipv6/ip6_output.c: ip6_make_flowlabel(net, skb, fl6->flowlabel, net/ipv6/ip6_tunnel.c=int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, net/ipv6/ip6_tunnel.c: ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6)); the sender side (ip6_tnl_xmit?) will set the IPv6 flow label in a similar manner done by udp_flow_src_port? and if the receiver side hashes on L3 IPv6 src/dst/flow label we'll get spreading? nice! Still, what do we do with IPv4 GRE tunnels? and what do we do with HW which isn't capable to RSS on flow label? > Please look at > https://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf > as well as Davem's "Less is More" presentation which highlights the > virtues of protocol generic HW mechanisms > (https://www.youtube.com/watch?v=6VgmazGwL_Y).