Re: xfrm4_garbage_collect reaching limit

2015-09-30 Thread Steffen Klassert
On Mon, Sep 21, 2015 at 10:51:11AM -0400, Dan Streetman wrote:
> On Fri, Sep 18, 2015 at 1:00 AM, Dan Streetman  wrote:
> > On Wed, Sep 16, 2015 at 4:45 AM, Steffen Klassert
> >  wrote:
> >>
> >> What about the patch below? With this we are independent of the number
> >> of cpus. It should cover most, if not all usecases.
> >
> > yep that works, thanks!  I'll give it a test also, but I don't see how
> > it would fail.
> 
> Yep, on a test setup that previously failed within several hours, it
> ran over the weekend successfully.  Thanks!
> 
> Tested-by: Dan Streetman 
> 
> >
> >>
> >> While we are at it, we could think about increasing the flowcache
> >> percpu limit. This value was choosen back in 2003, so maybe we could
> >> have more than 4k cache entries per cpu these days.
> >>
> >>
> >> Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default.
> >>
> >> The xfrm flowcache size is limited by the flowcache limit
> >> (4096 * number of online cpus) and the xfrm garbage collector
> >> threshold (2 * 32768), whatever is reached first. This means
> >> that we can hit the garbage collector limit only on systems
> >> with more than 16 cpus. On such systems we simply refuse
> >> new allocations if we reach the limit, so new flows are dropped.
> >> On syslems with 16 or less cpus, we hit the flowcache limit.
> >> In this case, we shrink the flow cache instead of refusing new
> >> flows.
> >>
> >> We increase the xfrm garbage collector threshold to INT_MAX
> >> to get the same behaviour, independent of the number of cpus.
> >>
> >> The xfrm garbage collector threshold can still be set below
> >> the flowcache limit to reduce the memory usage of the flowcache.
> >>
> >> Signed-off-by: Steffen Klassert 

I've applied this to ipsec-next now. It can be considered as a fix too,
but we still can tweak the value via the sysctl in the meantime. So
it is better to test it a bit longer before it hits the mainline.

Thanks a lot for your work Dan!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-21 Thread Dan Streetman
On Wed, Sep 16, 2015 at 4:45 AM, Steffen Klassert
 wrote:
> On Mon, Sep 14, 2015 at 11:14:59PM -0400, Dan Streetman wrote:
>> On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert
>>  wrote:
>> >
>> >> Possibly the
>> >> default value of xfrm4_gc_thresh could be set proportional to
>> >> num_online_cpus(), but that doesn't help when cpus are onlined after
>> >> boot.
>> >
>> > This could be an option, we could change the xfrm4_gc_thresh value with
>> > a cpu notifier callback if more cpus come up after boot.
>>
>> the issue there is, if the value is changed by the user, does a cpu
>> hotplug reset it back to default...
>
> What about the patch below? With this we are independent of the number
> of cpus. It should cover most, if not all usecases.
>
> While we are at it, we could think about increasing the flowcache
> percpu limit. This value was choosen back in 2003, so maybe we could
> have more than 4k cache entries per cpu these days.

Sounds reasonable, though I don't have any data on a good value, so
I'll leave that up to you :-)

>
>
> Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default.
>
> The xfrm flowcache size is limited by the flowcache limit
> (4096 * number of online cpus) and the xfrm garbage collector
> threshold (2 * 32768), whatever is reached first. This means
> that we can hit the garbage collector limit only on systems
> with more than 16 cpus. On such systems we simply refuse
> new allocations if we reach the limit, so new flows are dropped.
> On syslems with 16 or less cpus, we hit the flowcache limit.
> In this case, we shrink the flow cache instead of refusing new
> flows.
>
> We increase the xfrm garbage collector threshold to INT_MAX
> to get the same behaviour, independent of the number of cpus.
>
> The xfrm garbage collector threshold can still be set below
> the flowcache limit to reduce the memory usage of the flowcache.
>
> Signed-off-by: Steffen Klassert 
> ---
>  Documentation/networking/ip-sysctl.txt | 6 --
>  net/ipv4/xfrm4_policy.c| 2 +-
>  net/ipv6/xfrm6_policy.c| 2 +-
>  3 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index ebe94f2..260f30b 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1199,7 +1199,8 @@ tag - INTEGER
>  xfrm4_gc_thresh - INTEGER
> The threshold at which we will start garbage collecting for IPv4
> destination cache entries.  At twice this value the system will
> -   refuse new allocations.
> +   refuse new allocations. The value must be set below the flowcache
> +   limit (4096 * number of online cpus) to take effect.
>
>  igmp_link_local_mcast_reports - BOOLEAN
> Enable IGMP reports for link local multicast groups in the
> @@ -1645,7 +1646,8 @@ ratelimit - INTEGER
>  xfrm6_gc_thresh - INTEGER
> The threshold at which we will start garbage collecting for IPv6
> destination cache entries.  At twice this value the system will
> -   refuse new allocations.
> +   refuse new allocations. The value must be set below the flowcache
> +   limit (4096 * number of online cpus) to take effect.
>
>
>  IPv6 Update by:
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index 1e06c4f..3dffc73 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = {
> .destroy =  xfrm4_dst_destroy,
> .ifdown =   xfrm4_dst_ifdown,
> .local_out =__ip_local_out,
> -   .gc_thresh =32768,
> +   .gc_thresh =INT_MAX,
>  };
>
>  static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> index f10b940..e9af39a 100644
> --- a/net/ipv6/xfrm6_policy.c
> +++ b/net/ipv6/xfrm6_policy.c
> @@ -289,7 +289,7 @@ static struct dst_ops xfrm6_dst_ops = {
> .destroy =  xfrm6_dst_destroy,
> .ifdown =   xfrm6_dst_ifdown,
> .local_out =__ip6_local_out,
> -   .gc_thresh =32768,
> +   .gc_thresh =INT_MAX,
>  };
>
>  static struct xfrm_policy_afinfo xfrm6_policy_afinfo = {
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-21 Thread Dan Streetman
On Fri, Sep 18, 2015 at 1:00 AM, Dan Streetman  wrote:
> On Wed, Sep 16, 2015 at 4:45 AM, Steffen Klassert
>  wrote:
>> On Mon, Sep 14, 2015 at 11:14:59PM -0400, Dan Streetman wrote:
>>> On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert
>>>  wrote:
>>> >
>>> >> Possibly the
>>> >> default value of xfrm4_gc_thresh could be set proportional to
>>> >> num_online_cpus(), but that doesn't help when cpus are onlined after
>>> >> boot.
>>> >
>>> > This could be an option, we could change the xfrm4_gc_thresh value with
>>> > a cpu notifier callback if more cpus come up after boot.
>>>
>>> the issue there is, if the value is changed by the user, does a cpu
>>> hotplug reset it back to default...
>>
>> What about the patch below? With this we are independent of the number
>> of cpus. It should cover most, if not all usecases.
>
> yep that works, thanks!  I'll give it a test also, but I don't see how
> it would fail.

Yep, on a test setup that previously failed within several hours, it
ran over the weekend successfully.  Thanks!

Tested-by: Dan Streetman 

>
>>
>> While we are at it, we could think about increasing the flowcache
>> percpu limit. This value was choosen back in 2003, so maybe we could
>> have more than 4k cache entries per cpu these days.
>>
>>
>> Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default.
>>
>> The xfrm flowcache size is limited by the flowcache limit
>> (4096 * number of online cpus) and the xfrm garbage collector
>> threshold (2 * 32768), whatever is reached first. This means
>> that we can hit the garbage collector limit only on systems
>> with more than 16 cpus. On such systems we simply refuse
>> new allocations if we reach the limit, so new flows are dropped.
>> On syslems with 16 or less cpus, we hit the flowcache limit.
>> In this case, we shrink the flow cache instead of refusing new
>> flows.
>>
>> We increase the xfrm garbage collector threshold to INT_MAX
>> to get the same behaviour, independent of the number of cpus.
>>
>> The xfrm garbage collector threshold can still be set below
>> the flowcache limit to reduce the memory usage of the flowcache.
>>
>> Signed-off-by: Steffen Klassert 
>> ---
>>  Documentation/networking/ip-sysctl.txt | 6 --
>>  net/ipv4/xfrm4_policy.c| 2 +-
>>  net/ipv6/xfrm6_policy.c| 2 +-
>>  3 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/networking/ip-sysctl.txt 
>> b/Documentation/networking/ip-sysctl.txt
>> index ebe94f2..260f30b 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -1199,7 +1199,8 @@ tag - INTEGER
>>  xfrm4_gc_thresh - INTEGER
>> The threshold at which we will start garbage collecting for IPv4
>> destination cache entries.  At twice this value the system will
>> -   refuse new allocations.
>> +   refuse new allocations. The value must be set below the flowcache
>> +   limit (4096 * number of online cpus) to take effect.
>>
>>  igmp_link_local_mcast_reports - BOOLEAN
>> Enable IGMP reports for link local multicast groups in the
>> @@ -1645,7 +1646,8 @@ ratelimit - INTEGER
>>  xfrm6_gc_thresh - INTEGER
>> The threshold at which we will start garbage collecting for IPv6
>> destination cache entries.  At twice this value the system will
>> -   refuse new allocations.
>> +   refuse new allocations. The value must be set below the flowcache
>> +   limit (4096 * number of online cpus) to take effect.
>>
>>
>>  IPv6 Update by:
>> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
>> index 1e06c4f..3dffc73 100644
>> --- a/net/ipv4/xfrm4_policy.c
>> +++ b/net/ipv4/xfrm4_policy.c
>> @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = {
>> .destroy =  xfrm4_dst_destroy,
>> .ifdown =   xfrm4_dst_ifdown,
>> .local_out =__ip_local_out,
>> -   .gc_thresh =32768,
>> +   .gc_thresh =INT_MAX,
>>  };
>>
>>  static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
>> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
>> index f10b940..e9af39a 100644
>> --- a/net/ipv6/xfrm6_policy.c
>> +++ b/net/ipv6/xfrm6_policy.c
>> @@ -289,7 +289,7 @@ static struct dst_ops xfrm6_dst_ops = {
>> .destroy =  xfrm6_dst_destroy,
>> .ifdown =   xfrm6_dst_ifdown,
>> .local_out =__ip6_local_out,
>> -   .gc_thresh =32768,
>> +   .gc_thresh =INT_MAX,
>>  };
>>
>>  static struct xfrm_policy_afinfo xfrm6_policy_afinfo = {
>> --
>> 1.9.1
>>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-17 Thread Dan Streetman
On Wed, Sep 16, 2015 at 4:45 AM, Steffen Klassert
 wrote:
> On Mon, Sep 14, 2015 at 11:14:59PM -0400, Dan Streetman wrote:
>> On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert
>>  wrote:
>> >
>> >> Possibly the
>> >> default value of xfrm4_gc_thresh could be set proportional to
>> >> num_online_cpus(), but that doesn't help when cpus are onlined after
>> >> boot.
>> >
>> > This could be an option, we could change the xfrm4_gc_thresh value with
>> > a cpu notifier callback if more cpus come up after boot.
>>
>> the issue there is, if the value is changed by the user, does a cpu
>> hotplug reset it back to default...
>
> What about the patch below? With this we are independent of the number
> of cpus. It should cover most, if not all usecases.

yep that works, thanks!  I'll give it a test also, but I don't see how
it would fail.

>
> While we are at it, we could think about increasing the flowcache
> percpu limit. This value was choosen back in 2003, so maybe we could
> have more than 4k cache entries per cpu these days.
>
>
> Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default.
>
> The xfrm flowcache size is limited by the flowcache limit
> (4096 * number of online cpus) and the xfrm garbage collector
> threshold (2 * 32768), whatever is reached first. This means
> that we can hit the garbage collector limit only on systems
> with more than 16 cpus. On such systems we simply refuse
> new allocations if we reach the limit, so new flows are dropped.
> On syslems with 16 or less cpus, we hit the flowcache limit.
> In this case, we shrink the flow cache instead of refusing new
> flows.
>
> We increase the xfrm garbage collector threshold to INT_MAX
> to get the same behaviour, independent of the number of cpus.
>
> The xfrm garbage collector threshold can still be set below
> the flowcache limit to reduce the memory usage of the flowcache.
>
> Signed-off-by: Steffen Klassert 
> ---
>  Documentation/networking/ip-sysctl.txt | 6 --
>  net/ipv4/xfrm4_policy.c| 2 +-
>  net/ipv6/xfrm6_policy.c| 2 +-
>  3 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index ebe94f2..260f30b 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1199,7 +1199,8 @@ tag - INTEGER
>  xfrm4_gc_thresh - INTEGER
> The threshold at which we will start garbage collecting for IPv4
> destination cache entries.  At twice this value the system will
> -   refuse new allocations.
> +   refuse new allocations. The value must be set below the flowcache
> +   limit (4096 * number of online cpus) to take effect.
>
>  igmp_link_local_mcast_reports - BOOLEAN
> Enable IGMP reports for link local multicast groups in the
> @@ -1645,7 +1646,8 @@ ratelimit - INTEGER
>  xfrm6_gc_thresh - INTEGER
> The threshold at which we will start garbage collecting for IPv6
> destination cache entries.  At twice this value the system will
> -   refuse new allocations.
> +   refuse new allocations. The value must be set below the flowcache
> +   limit (4096 * number of online cpus) to take effect.
>
>
>  IPv6 Update by:
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index 1e06c4f..3dffc73 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = {
> .destroy =  xfrm4_dst_destroy,
> .ifdown =   xfrm4_dst_ifdown,
> .local_out =__ip_local_out,
> -   .gc_thresh =32768,
> +   .gc_thresh =INT_MAX,
>  };
>
>  static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> index f10b940..e9af39a 100644
> --- a/net/ipv6/xfrm6_policy.c
> +++ b/net/ipv6/xfrm6_policy.c
> @@ -289,7 +289,7 @@ static struct dst_ops xfrm6_dst_ops = {
> .destroy =  xfrm6_dst_destroy,
> .ifdown =   xfrm6_dst_ifdown,
> .local_out =__ip6_local_out,
> -   .gc_thresh =32768,
> +   .gc_thresh =INT_MAX,
>  };
>
>  static struct xfrm_policy_afinfo xfrm6_policy_afinfo = {
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-17 Thread Steffen Klassert
On Thu, Sep 17, 2015 at 09:23:35PM -0700, David Miller wrote:
> From: Steffen Klassert 
> Date: Wed, 16 Sep 2015 10:45:41 +0200
> 
> > index 1e06c4f..3dffc73 100644
> > --- a/net/ipv4/xfrm4_policy.c
> > +++ b/net/ipv4/xfrm4_policy.c
> > @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = {
> > .destroy =  xfrm4_dst_destroy,
> > .ifdown =   xfrm4_dst_ifdown,
> > .local_out =__ip_local_out,
> > -   .gc_thresh =32768,
> > +   .gc_thresh =INT_MAX,
> >  };
> >  
> >  static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
> 
> This means the dst_ops->gc() for xfrm will never be invoked.
> 
> Is that intentional?

Yes. This is already the case on systems with less than 8 cpus
because the flowcache is limited to 4096 entries per cpu. The
percpu flowcache shrinks itself to 'low_watermark' enrires if
it hits the percpu limit.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-17 Thread David Miller
From: Steffen Klassert 
Date: Wed, 16 Sep 2015 10:45:41 +0200

> index 1e06c4f..3dffc73 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = {
>   .destroy =  xfrm4_dst_destroy,
>   .ifdown =   xfrm4_dst_ifdown,
>   .local_out =__ip_local_out,
> - .gc_thresh =32768,
> + .gc_thresh =INT_MAX,
>  };
>  
>  static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {

This means the dst_ops->gc() for xfrm will never be invoked.

Is that intentional?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-16 Thread Steffen Klassert
On Mon, Sep 14, 2015 at 11:14:59PM -0400, Dan Streetman wrote:
> On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert
>  wrote:
> >
> >> Possibly the
> >> default value of xfrm4_gc_thresh could be set proportional to
> >> num_online_cpus(), but that doesn't help when cpus are onlined after
> >> boot.
> >
> > This could be an option, we could change the xfrm4_gc_thresh value with
> > a cpu notifier callback if more cpus come up after boot.
> 
> the issue there is, if the value is changed by the user, does a cpu
> hotplug reset it back to default...

What about the patch below? With this we are independent of the number
of cpus. It should cover most, if not all usecases.

While we are at it, we could think about increasing the flowcache
percpu limit. This value was choosen back in 2003, so maybe we could
have more than 4k cache entries per cpu these days.


Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default.

The xfrm flowcache size is limited by the flowcache limit
(4096 * number of online cpus) and the xfrm garbage collector
threshold (2 * 32768), whatever is reached first. This means
that we can hit the garbage collector limit only on systems
with more than 16 cpus. On such systems we simply refuse
new allocations if we reach the limit, so new flows are dropped.
On syslems with 16 or less cpus, we hit the flowcache limit.
In this case, we shrink the flow cache instead of refusing new
flows.

We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.

The xfrm garbage collector threshold can still be set below
the flowcache limit to reduce the memory usage of the flowcache.

Signed-off-by: Steffen Klassert 
---
 Documentation/networking/ip-sysctl.txt | 6 --
 net/ipv4/xfrm4_policy.c| 2 +-
 net/ipv6/xfrm6_policy.c| 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ebe94f2..260f30b 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1199,7 +1199,8 @@ tag - INTEGER
 xfrm4_gc_thresh - INTEGER
The threshold at which we will start garbage collecting for IPv4
destination cache entries.  At twice this value the system will
-   refuse new allocations.
+   refuse new allocations. The value must be set below the flowcache
+   limit (4096 * number of online cpus) to take effect.
 
 igmp_link_local_mcast_reports - BOOLEAN
Enable IGMP reports for link local multicast groups in the
@@ -1645,7 +1646,8 @@ ratelimit - INTEGER
 xfrm6_gc_thresh - INTEGER
The threshold at which we will start garbage collecting for IPv6
destination cache entries.  At twice this value the system will
-   refuse new allocations.
+   refuse new allocations. The value must be set below the flowcache
+   limit (4096 * number of online cpus) to take effect.
 
 
 IPv6 Update by:
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 1e06c4f..3dffc73 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = {
.destroy =  xfrm4_dst_destroy,
.ifdown =   xfrm4_dst_ifdown,
.local_out =__ip_local_out,
-   .gc_thresh =32768,
+   .gc_thresh =INT_MAX,
 };
 
 static struct xfrm_policy_afinfo xfrm4_policy_afinfo = {
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index f10b940..e9af39a 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -289,7 +289,7 @@ static struct dst_ops xfrm6_dst_ops = {
.destroy =  xfrm6_dst_destroy,
.ifdown =   xfrm6_dst_ifdown,
.local_out =__ip6_local_out,
-   .gc_thresh =32768,
+   .gc_thresh =INT_MAX,
 };
 
 static struct xfrm_policy_afinfo xfrm6_policy_afinfo = {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-14 Thread Dan Streetman
On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert
 wrote:
> Hi Dan.
>
> On Thu, Sep 10, 2015 at 05:01:26PM -0400, Dan Streetman wrote:
>> Hi Steffen,
>>
>> I've been working with Jay on a ipsec issue, which I believe he
>> discussed with you.
>
> Yes, we talked about this at the LPC.
>
>> In this case the xfrm4_garbage_collect is
>> returning error because the number of xfrm4 dst entries has exceeded
>> twice the gc_thresh, which causes new allocations of xfrm4 dst objects
>> to fail, thus making the ipsec connection unusable (until dst objects
>> are removed/freed).
>>
>> The main reason the count gets to the limit is because the
>> xfrm4_policy_afinfo.garbage_collect function - which points to
>> flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
>> dst will get cleaned up, it only cleans up unused entries.
>>
>> The flow cache hashtable size limit watermark does restrict how many
>> flow cache entries exist (by shrinking the per-cpu hashtable once it
>> has 4k entries), and therefore indirectly controls the total number of
>> xfrm4 dst objects.  However, there's a mismatch between the default
>> xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
>> objects) - and the flow cache hashtable limit of 4k objects per cpu.
>> Any system with 16 or less cpus will have a total limit of 64k (or
>> less) flow cache entries, so the 64k xfrm4 dst entry limit will never
>> be reached.  However for any system with more than 16 cpus, the flow
>> cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
>> allocation can fail, rendering the ipsec connection unusable.
>>
>> The most obvious solution is for the system admin to increase the
>> xfrm4_gc_thresh value, although it's not really an obvious solution to
>> the end-user what value they should set it to :-)
>
> Yes, a static gc threshold is always wrong for some workloads. So
> the user needs to adjust it to his needs, even if the right value
> is not obvious.
>
>> Possibly the
>> default value of xfrm4_gc_thresh could be set proportional to
>> num_online_cpus(), but that doesn't help when cpus are onlined after
>> boot.
>
> This could be an option, we could change the xfrm4_gc_thresh value with
> a cpu notifier callback if more cpus come up after boot.

the issue there is, if the value is changed by the user, does a cpu
hotplug reset it back to default...

>
>> Also, a warning message indicating the xfrm4_gc_thresh limit
>> was reached, and a suggestion to increase the limit, may help anyone
>> who hits the issue.

what do you think about this?  it's the simplest option; something like

pr_warn_ratelimited("xfrm4_gc_limit exceeded\n");

or with a suggestion...

pr_warn_ratelimited("xfrm4_gc_limit exceeded, you may want to increase
to %d or more",
  2048 * num_online_cpus());

>>
>> I'm not sure if something more aggressive is appropriate, like
>> removing active entries during garbage collection.
>
> It would not make too much sense to push an active flow out of the
> fastpath just to add some other flow. If the number of active
> entries is to high, there is no other option than increasing the
> gc threshold.
>
> You could try to reduce the number of active entries by shutting
> down stale security associations frequently.
>
>> Or, removing the
>> failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
>> always be allocated,
>
> This would open doors for DOS attacks, we can't do this.
>
>> or just increasing it from gc_thresh * 2 up to *
>> 4 or more.
>
> This would just defer the problem, so not a real solution.
>
> That said, whatever we do, we just paper over the real problem,
> that is the flowcache itself. Everything that need this kind
> of garbage collecting is fundamentally broken. But as long as
> nobody volunteers to work on a replacement, we have to live
> with this situation somehow.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfrm4_garbage_collect reaching limit

2015-09-11 Thread Steffen Klassert
Hi Dan.

On Thu, Sep 10, 2015 at 05:01:26PM -0400, Dan Streetman wrote:
> Hi Steffen,
> 
> I've been working with Jay on a ipsec issue, which I believe he
> discussed with you.  

Yes, we talked about this at the LPC.

> In this case the xfrm4_garbage_collect is
> returning error because the number of xfrm4 dst entries has exceeded
> twice the gc_thresh, which causes new allocations of xfrm4 dst objects
> to fail, thus making the ipsec connection unusable (until dst objects
> are removed/freed).
> 
> The main reason the count gets to the limit is because the
> xfrm4_policy_afinfo.garbage_collect function - which points to
> flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
> dst will get cleaned up, it only cleans up unused entries.
> 
> The flow cache hashtable size limit watermark does restrict how many
> flow cache entries exist (by shrinking the per-cpu hashtable once it
> has 4k entries), and therefore indirectly controls the total number of
> xfrm4 dst objects.  However, there's a mismatch between the default
> xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
> objects) - and the flow cache hashtable limit of 4k objects per cpu.
> Any system with 16 or less cpus will have a total limit of 64k (or
> less) flow cache entries, so the 64k xfrm4 dst entry limit will never
> be reached.  However for any system with more than 16 cpus, the flow
> cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
> allocation can fail, rendering the ipsec connection unusable.
> 
> The most obvious solution is for the system admin to increase the
> xfrm4_gc_thresh value, although it's not really an obvious solution to
> the end-user what value they should set it to :-) 

Yes, a static gc threshold is always wrong for some workloads. So
the user needs to adjust it to his needs, even if the right value
is not obvious.

> Possibly the
> default value of xfrm4_gc_thresh could be set proportional to
> num_online_cpus(), but that doesn't help when cpus are onlined after
> boot.  

This could be an option, we could change the xfrm4_gc_thresh value with
a cpu notifier callback if more cpus come up after boot.

> Also, a warning message indicating the xfrm4_gc_thresh limit
> was reached, and a suggestion to increase the limit, may help anyone
> who hits the issue.
> 
> I'm not sure if something more aggressive is appropriate, like
> removing active entries during garbage collection. 

It would not make too much sense to push an active flow out of the
fastpath just to add some other flow. If the number of active
entries is to high, there is no other option than increasing the
gc threshold.

You could try to reduce the number of active entries by shutting
down stale security associations frequently.

> Or, removing the
> failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
> always be allocated,

This would open doors for DOS attacks, we can't do this.

> or just increasing it from gc_thresh * 2 up to *
> 4 or more.

This would just defer the problem, so not a real solution.

That said, whatever we do, we just paper over the real problem,
that is the flowcache itself. Everything that need this kind
of garbage collecting is fundamentally broken. But as long as
nobody volunteers to work on a replacement, we have to live
with this situation somehow.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


xfrm4_garbage_collect reaching limit

2015-09-10 Thread Dan Streetman
Hi Steffen,

I've been working with Jay on a ipsec issue, which I believe he
discussed with you.  In this case the xfrm4_garbage_collect is
returning error because the number of xfrm4 dst entries has exceeded
twice the gc_thresh, which causes new allocations of xfrm4 dst objects
to fail, thus making the ipsec connection unusable (until dst objects
are removed/freed).

The main reason the count gets to the limit is because the
xfrm4_policy_afinfo.garbage_collect function - which points to
flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
dst will get cleaned up, it only cleans up unused entries.

The flow cache hashtable size limit watermark does restrict how many
flow cache entries exist (by shrinking the per-cpu hashtable once it
has 4k entries), and therefore indirectly controls the total number of
xfrm4 dst objects.  However, there's a mismatch between the default
xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
objects) - and the flow cache hashtable limit of 4k objects per cpu.
Any system with 16 or less cpus will have a total limit of 64k (or
less) flow cache entries, so the 64k xfrm4 dst entry limit will never
be reached.  However for any system with more than 16 cpus, the flow
cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
allocation can fail, rendering the ipsec connection unusable.

The most obvious solution is for the system admin to increase the
xfrm4_gc_thresh value, although it's not really an obvious solution to
the end-user what value they should set it to :-)  Possibly the
default value of xfrm4_gc_thresh could be set proportional to
num_online_cpus(), but that doesn't help when cpus are onlined after
boot.  Also, a warning message indicating the xfrm4_gc_thresh limit
was reached, and a suggestion to increase the limit, may help anyone
who hits the issue.

I'm not sure if something more aggressive is appropriate, like
removing active entries during garbage collection.  Or, removing the
failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
always be allocated, or just increasing it from gc_thresh * 2 up to *
4 or more.

Also, I refer to xfrm4 above, but I believe this will affect xfrm6 as well.

Any thoughts and/or suggestions?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html