Re: [PATCH net-next v2] net: ipv4: add support for ECMP hash policy choice

2017-03-12 Thread David Miller
From: Jakub Sitnicki 
Date: Wed, 08 Mar 2017 17:00:05 +0100

> On Wed, Mar 08, 2017 at 12:43 PM GMT, Nikolay Aleksandrov wrote:
>> On 08/03/17 14:05, Jakub Sitnicki wrote:
>>> On Tue, Mar 07, 2017 at 11:01 AM GMT, Nikolay Aleksandrov wrote:
 This patch adds support for ECMP hash policy choice via a new sysctl
 called fib_multipath_hash_policy and also adds support for L4 hashes.
 The current values for fib_multipath_hash_policy are:
  0 - layer 3 (default)
  1 - layer 4
 If there's an skb hash already set and it matches the chosen policy then it
 will be used instead of being calculated. The ICMP inner IP addresses use
 is removed.

 Signed-off-by: Nikolay Aleksandrov 
 ---
 v2:
  - removed the output_key_hash as it's not needed anymore
  - reverted to my original/internal patch with L3 as default hash
>>> 
>>> What about ICMP PTB (Fragmentation Needed) forwarding that makes PMTUD
>>> work with ECMP in setups like described in RFC7690 [1]?
>>> 
>>>   ptb -> router ecmp -> next hop L4/L7 load balancer -> destination
>>> 
>>>router --> load balancer 1 --->
>>> \\--> load balancer 2 ---> load-balanced service
>>>  \--> load balancer N --->
>>> 
>>> Removing special treatment of ICMP errors will break it, won't it?
>>> 
>>
>> Yes, I am aware and this decision was made with that in mind.
>> We'd like to use the HW hash when available and IIRC that doesn't play well 
>> with
>> special-casing ICMP errors for anycast as it may not match also. Another 
>> thing,
>> again if I remember correctly, was that this behaviour is closer to how 
>> hardware
>> handles ECMP.
> 
> OK, I wanted to make sure that is not an oversight that ECMP routing in
> ipv4 stack is to be dumbed down to match the hardware behavior. I
> thought that it was an advantage that we want to have over hardware
> routers. (To be fair, I should mention that we don't have it in ipv6
> stack ATM.)
> 
>>
>> One thing we can do is leave the current L3 behaviour with ICMP error 
>> handling
>> and add a new L3 mode that tries to use the skb hash when available and 
>> doesn't
>> care about the packet type.
>>
>> What do you think ?
> 
> Sounds good to me. Would be good to hear other opinions also.

This would be so much less of an issue with symmetric hashing.  A quick glance
seems to indicate that Microsoft didn't specify the Toeplitz hash to order the
ports and the addresses so that it would be symmetric, so we can guess what
every piece of hardware out there computing a hash does :-/

We could solve this ICMP problem by using a symmetric hash (flow
dissector already supports this), but then we're back to the problem
that this behaves differently from card computed hashes and hardware
offload of ECMP.

I have to say that losing this ICMP handling makes the current code a
non-starter.  The existing code explicitly was written to handle this
case properly, so just undoing it and making it stop working is not
really something we can do.


Re: [PATCH net-next v2] net: ipv4: add support for ECMP hash policy choice

2017-03-08 Thread Jakub Sitnicki
On Wed, Mar 08, 2017 at 12:43 PM GMT, Nikolay Aleksandrov wrote:
> On 08/03/17 14:05, Jakub Sitnicki wrote:
>> On Tue, Mar 07, 2017 at 11:01 AM GMT, Nikolay Aleksandrov wrote:
>>> This patch adds support for ECMP hash policy choice via a new sysctl
>>> called fib_multipath_hash_policy and also adds support for L4 hashes.
>>> The current values for fib_multipath_hash_policy are:
>>>  0 - layer 3 (default)
>>>  1 - layer 4
>>> If there's an skb hash already set and it matches the chosen policy then it
>>> will be used instead of being calculated. The ICMP inner IP addresses use
>>> is removed.
>>>
>>> Signed-off-by: Nikolay Aleksandrov 
>>> ---
>>> v2:
>>>  - removed the output_key_hash as it's not needed anymore
>>>  - reverted to my original/internal patch with L3 as default hash
>> 
>> What about ICMP PTB (Fragmentation Needed) forwarding that makes PMTUD
>> work with ECMP in setups like described in RFC7690 [1]?
>> 
>>   ptb -> router ecmp -> next hop L4/L7 load balancer -> destination
>> 
>>router --> load balancer 1 --->
>> \\--> load balancer 2 ---> load-balanced service
>>  \--> load balancer N --->
>> 
>> Removing special treatment of ICMP errors will break it, won't it?
>> 
>
> Yes, I am aware and this decision was made with that in mind.
> We'd like to use the HW hash when available and IIRC that doesn't play well 
> with
> special-casing ICMP errors for anycast as it may not match also. Another 
> thing,
> again if I remember correctly, was that this behaviour is closer to how 
> hardware
> handles ECMP.

OK, I wanted to make sure that is not an oversight that ECMP routing in
ipv4 stack is to be dumbed down to match the hardware behavior. I
thought that it was an advantage that we want to have over hardware
routers. (To be fair, I should mention that we don't have it in ipv6
stack ATM.)

>
> One thing we can do is leave the current L3 behaviour with ICMP error handling
> and add a new L3 mode that tries to use the skb hash when available and 
> doesn't
> care about the packet type.
>
> What do you think ?

Sounds good to me. Would be good to hear other opinions also.

Thanks,
Jakub


Re: [PATCH net-next v2] net: ipv4: add support for ECMP hash policy choice

2017-03-08 Thread Nikolay Aleksandrov
On 08/03/17 14:05, Jakub Sitnicki wrote:
> Hi Nikolay,
> 
> On Tue, Mar 07, 2017 at 11:01 AM GMT, Nikolay Aleksandrov wrote:
>> This patch adds support for ECMP hash policy choice via a new sysctl
>> called fib_multipath_hash_policy and also adds support for L4 hashes.
>> The current values for fib_multipath_hash_policy are:
>>  0 - layer 3 (default)
>>  1 - layer 4
>> If there's an skb hash already set and it matches the chosen policy then it
>> will be used instead of being calculated. The ICMP inner IP addresses use
>> is removed.
>>
>> Signed-off-by: Nikolay Aleksandrov 
>> ---
>> v2:
>>  - removed the output_key_hash as it's not needed anymore
>>  - reverted to my original/internal patch with L3 as default hash
> 
> What about ICMP PTB (Fragmentation Needed) forwarding that makes PMTUD
> work with ECMP in setups like described in RFC7690 [1]?
> 
>   ptb -> router ecmp -> next hop L4/L7 load balancer -> destination
> 
>router --> load balancer 1 --->
> \\--> load balancer 2 ---> load-balanced service
>  \--> load balancer N --->
> 
> Removing special treatment of ICMP errors will break it, won't it?
> 

Yes, I am aware and this decision was made with that in mind.
We'd like to use the HW hash when available and IIRC that doesn't play well with
special-casing ICMP errors for anycast as it may not match also. Another thing,
again if I remember correctly, was that this behaviour is closer to how hardware
handles ECMP.

One thing we can do is leave the current L3 behaviour with ICMP error handling
and add a new L3 mode that tries to use the skb hash when available and doesn't
care about the packet type.

What do you think ?

> FWIW, I gave a run to your patch (default settings, L3 hash) with a test
> script [2] that simulates such a setup and it confirmed my worries - PTB
> errors don't travel back to the source host any more.

Thanks for testing.

> 
> -Jakub
> 
> [1] https://tools.ietf.org/html/rfc7690#section-2
> [2] 
> https://github.com/jsitnicki/tools/commit/ccb85e68421df4ffd8b7abf00f6f5fe1c6a5af76
> 



Re: [PATCH net-next v2] net: ipv4: add support for ECMP hash policy choice

2017-03-08 Thread Jakub Sitnicki
Hi Nikolay,

On Tue, Mar 07, 2017 at 11:01 AM GMT, Nikolay Aleksandrov wrote:
> This patch adds support for ECMP hash policy choice via a new sysctl
> called fib_multipath_hash_policy and also adds support for L4 hashes.
> The current values for fib_multipath_hash_policy are:
>  0 - layer 3 (default)
>  1 - layer 4
> If there's an skb hash already set and it matches the chosen policy then it
> will be used instead of being calculated. The ICMP inner IP addresses use
> is removed.
>
> Signed-off-by: Nikolay Aleksandrov 
> ---
> v2:
>  - removed the output_key_hash as it's not needed anymore
>  - reverted to my original/internal patch with L3 as default hash

What about ICMP PTB (Fragmentation Needed) forwarding that makes PMTUD
work with ECMP in setups like described in RFC7690 [1]?

  ptb -> router ecmp -> next hop L4/L7 load balancer -> destination

   router --> load balancer 1 --->
\\--> load balancer 2 ---> load-balanced service
 \--> load balancer N --->

Removing special treatment of ICMP errors will break it, won't it?

FWIW, I gave a run to your patch (default settings, L3 hash) with a test
script [2] that simulates such a setup and it confirmed my worries - PTB
errors don't travel back to the source host any more.

-Jakub

[1] https://tools.ietf.org/html/rfc7690#section-2
[2] 
https://github.com/jsitnicki/tools/commit/ccb85e68421df4ffd8b7abf00f6f5fe1c6a5af76


[PATCH net-next v2] net: ipv4: add support for ECMP hash policy choice

2017-03-07 Thread Nikolay Aleksandrov
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
 0 - layer 3 (default)
 1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated. The ICMP inner IP addresses use
is removed.

Signed-off-by: Nikolay Aleksandrov 
---
v2:
 - removed the output_key_hash as it's not needed anymore
 - reverted to my original/internal patch with L3 as default hash

 Documentation/networking/ip-sysctl.txt |  8 +++
 include/net/ip_fib.h   | 14 ++---
 include/net/netns/ipv4.h   |  1 +
 include/net/route.h|  9 +---
 net/ipv4/fib_semantics.c   | 11 ++--
 net/ipv4/icmp.c| 19 +--
 net/ipv4/route.c   | 95 ++
 net/ipv4/sysctl_net_ipv4.c |  9 
 8 files changed, 78 insertions(+), 88 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index fc73eeb7b3b8..558e19653106 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -73,6 +73,14 @@ fib_multipath_use_neigh - BOOLEAN
0 - disabled
1 - enabled
 
+fib_multipath_hash_policy - INTEGER
+   Controls which hash policy to use for multipath routes. Only valid
+   for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled.
+   Default: 0 (Layer 3)
+   Possible values:
+   0 - Layer 3
+   1 - Layer 4
+
 route/max_size - INTEGER
Maximum number of routes allowed in the kernel.  Increase
this when using large numbers of interfaces and/or routes.
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 368bb4024b78..8ac9bec053c5 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -371,17 +371,13 @@ int fib_sync_down_dev(struct net_device *dev, unsigned 
long event, bool force);
 int fib_sync_down_addr(struct net_device *dev, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 
-extern u32 fib_multipath_secret __read_mostly;
-
-static inline int fib_multipath_hash(__be32 saddr, __be32 daddr)
-{
-   return jhash_2words((__force u32)saddr, (__force u32)daddr,
-   fib_multipath_secret) >> 1;
-}
-
+#ifdef CONFIG_IP_ROUTE_MULTIPATH
+int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
+  const struct sk_buff *skb);
+#endif
 void fib_select_multipath(struct fib_result *res, int hash);
 void fib_select_path(struct net *net, struct fib_result *res,
-struct flowi4 *fl4, int mp_hash);
+struct flowi4 *fl4);
 
 /* Exported by fib_trie.c */
 void fib_trie_init(void);
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 622d2da27135..70a1d4251790 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -152,6 +152,7 @@ struct netns_ipv4 {
 #endif
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
int sysctl_fib_multipath_use_neigh;
+   int sysctl_fib_multipath_hash_policy;
 #endif
 
unsigned intfib_seq;/* protected by rtnl_mutex */
diff --git a/include/net/route.h b/include/net/route.h
index c0874c87c173..32412c91c222 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -113,14 +113,7 @@ struct in_device;
 int ip_rt_init(void);
 void rt_cache_flush(struct net *net);
 void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key_hash(struct net *, struct flowi4 *flp,
- int mp_hash);
-
-static inline struct rtable *__ip_route_output_key(struct net *net,
-  struct flowi4 *flp)
-{
-   return __ip_route_output_key_hash(net, flp, -1);
-}
+struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp);
 
 struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
const struct sock *sk);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 317026a39cfa..6601bd9744c9 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -57,7 +57,6 @@ static unsigned int fib_info_cnt;
 static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE];
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-u32 fib_multipath_secret __read_mostly;
 
 #define for_nexthops(fi) { \
int nhsel; const struct fib_nh *nh; \
@@ -576,9 +575,6 @@ static void fib_rebalance(struct fib_info *fi)
 
atomic_set(&nexthop_nh->nh_upper_bound, upper_bound);
} endfor_nexthops(fi);
-
-   net_get_random_once(&fib_multipath_secret,
-   sizeof(fib_multipath_secret));
 }
 
 static inline void fib_add_weight(struct fib_info *fi,
@@ -1641,