On Sun, Sep 3, 2017 at 6:43 PM, Tom Herbert <t...@herbertland.com> wrote:
> On Sat, Sep 2, 2017 at 9:11 PM, Saeed Mahameed <sae...@dev.mellanox.co.il> 
> wrote:
>> On Sat, Sep 2, 2017 at 6:37 PM, Tom Herbert <t...@herbertland.com> wrote:
>>> On Sat, Sep 2, 2017 at 6:32 PM, Hannes Frederic Sowa
>>> <han...@stressinduktion.org> wrote:
>>>> Hi Saeed,
>>>>
>>>> On Sun, Sep 3, 2017, at 01:01, Saeed Mahameed wrote:
>>>>> On Thu, Aug 31, 2017 at 6:51 AM, Hannes Frederic Sowa
>>>>> <han...@stressinduktion.org> wrote:
>>>>> > Saeed Mahameed <sae...@mellanox.com> writes:
>>>>> >
>>>>> >> The first patch from Gal and Ariel provides the mlx5 driver support for
>>>>> >> ConnectX capability to perform IP version identification and matching 
>>>>> >> in
>>>>> >> order to distinguish between IPv4 and IPv6 without the need to specify 
>>>>> >> the
>>>>> >> encapsulation type, thus perform RSS in MPLS automatically without
>>>>> >> specifying MPLS ethertyoe. This patch will also serve for inner GRE 
>>>>> >> IPv4/6
>>>>> >> classification for inner GRE RSS.
>>>>> >
>>>>> > I don't think this is legal at all or did I misunderstood something?
>>>>> >
>>>>> > <https://tools.ietf.org/html/rfc3032#section-2.2>
>>>>>
>>>>> It seems you misunderstood the cover letter.  The HW will still
>>>>> identify MPLS (IPv4/IPv6) packets using a new bit we specify in the HW
>>>>> steering rules rather than adding new specific rules with  {MPLS
>>>>> ethertype} X {IPv4,IPv6} to classify MPLS IPv{4,6} traffic, Same
>>>>> functionality a better and general way to approach it.
>>>>> Bottom line the hardware is capable of processing MPLS headers and
>>>>> perform RSS on the inner packet (IPv4/6) without the need of the
>>>>> driver to provide precise steering MPLS rules.
>>>>
>>>> Sorry, I think I am still confused.
>>>>
>>>> I just want to make sure that you don't use the first nibble after the
>>>> mpls bottom of stack label in any way as an indicator if that is an IPv4
>>>> or IPv6 packet by default. It can be anything. The forward equivalence
>>>> class tells the stack which protocol you see.
>>>>
>>>> If you match on the first nibble behind the MPLS bottom of stack label
>>>> the '4' or '6' respectively could be part of a MAC address with its
>>>> first nibble being 4 or 6, because the particular pseudowire is EoMPLS
>>>> and uses no control world.
>>>>
>>>> I wanted to mention it, because with addition of e.g. VPLS this could
>>>> cause problems down the road and should at least be controllable? It is
>>>> probably better to use Entropy Labels in future.
>>>>
>>> Or just use IPv6 with flow label for RSS (or MPLS/UDP, GRE/UDP if you
>>> prefer) then all this protocol specific DPI for RSS just goes away ;-)

>> How does MPLS/UDP or GRE/UDP RSS works without protocol specific DPI ?
>> unlike vxlan those protocols are not over UDP and you can't just play
>> with the outer header udp src port, or do you ?
>> Can you elaborate ?

> An encapsulator sets the UDP source port to be the flow entropy of the
> packet being encapsulated. So when the packet traverses the network
> devices can base their hash just on the canonical 5-tuple which is
> sufficient for ECMP and RSS. IPv6 flow label is even better since the
> middleboxes don't even need to look at the transport header, a packet
> is steered based on the 3-tuple of addresses and flow label. In the
> Linux stack,  udp_flow_src_port is used by UDP encapsulations to set
> the source port. Flow label is similarly set by ip6_make_flowlabel.
> Both of these functions use the skb->hash which is computed by calling
> flow dissector at most once per packet (if the packet was received
> with an L4 HW hash or locally originated on a connection the hash does
> not need to be computed).

Hi Tom,

Re all sorts of udp encap, sure, we're all on the less-is-more thing and just
RSS-ing on the ip+udp encap header.

For GRE, I was trying to fight back that rss-ing on inner, but as
Saeed commented,
we didn't see something simple through which the HW can do spreading. To make
sure I follow, you are saying that if this is gre6 tunneling

net-next.git]# git grep -p ip6_make_flowlabel net/ include/linux/ include/net/
include/net/ipv6.h=static inline void iph_to_flow_copy_v6addrs(struct
flow_keys *flow,
include/net/ipv6.h:static inline __be32 ip6_make_flowlabel(struct net
*net, struct sk_buff *skb,
include/net/ipv6.h=static inline void ip6_set_txhash(struct sock *sk) { }
include/net/ipv6.h:static inline __be32 ip6_make_flowlabel(struct net
*net, struct sk_buff *skb,
net/ipv6/ip6_gre.c=static int ip6gre_header(struct sk_buff *skb,
struct net_device *dev,
net/ipv6/ip6_gre.c:                  ip6_make_flowlabel(dev_net(dev), skb,
net/ipv6/ip6_output.c=int ip6_xmit(const struct sock *sk, struct
sk_buff *skb, struct flowi6 *fl6,
net/ipv6/ip6_output.c:  ip6_flow_hdr(hdr, tclass,
ip6_make_flowlabel(net, skb, fl6->flowlabel,
net/ipv6/ip6_output.c=struct sk_buff *__ip6_make_skb(struct sock *sk,
net/ipv6/ip6_output.c:               ip6_make_flowlabel(net, skb,
fl6->flowlabel,
net/ipv6/ip6_tunnel.c=int ip6_tnl_xmit(struct sk_buff *skb, struct
net_device *dev, __u8 dsfield,
net/ipv6/ip6_tunnel.c:               ip6_make_flowlabel(net, skb,
fl6->flowlabel, true, fl6));

the sender side (ip6_tnl_xmit?) will set the IPv6 flow label in a
similar manner done by udp_flow_src_port? and
if the receiver side hashes on L3 IPv6 src/dst/flow label we'll get
spreading? nice!

Still, what do we do with IPv4 GRE tunnels? and what do we do with HW
which isn't capable to RSS on flow label?

> Please look at 
> https://people.netfilter.org/pablo/netdev0.1/papers/UDP-Encapsulation-in-Linux.pdf
> as well as Davem's "Less is More" presentation which highlights the
> virtues of protocol generic HW mechanisms
> (https://www.youtube.com/watch?v=6VgmazGwL_Y).

Reply via email to