Re: xfrm4_garbage_collect reaching limit

Dan Streetman Mon, 14 Sep 2015 20:16:09 -0700

On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert
<[email protected]> wrote:
> Hi Dan.
>
> On Thu, Sep 10, 2015 at 05:01:26PM -0400, Dan Streetman wrote:
>> Hi Steffen,
>>
>> I've been working with Jay on a ipsec issue, which I believe he
>> discussed with you.
>
> Yes, we talked about this at the LPC.
>
>> In this case the xfrm4_garbage_collect is
>> returning error because the number of xfrm4 dst entries has exceeded
>> twice the gc_thresh, which causes new allocations of xfrm4 dst objects
>> to fail, thus making the ipsec connection unusable (until dst objects
>> are removed/freed).
>>
>> The main reason the count gets to the limit is because the
>> xfrm4_policy_afinfo.garbage_collect function - which points to
>> flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
>> dst will get cleaned up, it only cleans up unused entries.
>>
>> The flow cache hashtable size limit watermark does restrict how many
>> flow cache entries exist (by shrinking the per-cpu hashtable once it
>> has 4k entries), and therefore indirectly controls the total number of
>> xfrm4 dst objects.  However, there's a mismatch between the default
>> xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
>> objects) - and the flow cache hashtable limit of 4k objects per cpu.
>> Any system with 16 or less cpus will have a total limit of 64k (or
>> less) flow cache entries, so the 64k xfrm4 dst entry limit will never
>> be reached.  However for any system with more than 16 cpus, the flow
>> cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
>> allocation can fail, rendering the ipsec connection unusable.
>>
>> The most obvious solution is for the system admin to increase the
>> xfrm4_gc_thresh value, although it's not really an obvious solution to
>> the end-user what value they should set it to :-)
>
> Yes, a static gc threshold is always wrong for some workloads. So
> the user needs to adjust it to his needs, even if the right value
> is not obvious.
>
>> Possibly the
>> default value of xfrm4_gc_thresh could be set proportional to
>> num_online_cpus(), but that doesn't help when cpus are onlined after
>> boot.
>
> This could be an option, we could change the xfrm4_gc_thresh value with
> a cpu notifier callback if more cpus come up after boot.


the issue there is, if the value is changed by the user, does a cpu
hotplug reset it back to default...

>
>> Also, a warning message indicating the xfrm4_gc_thresh limit
>> was reached, and a suggestion to increase the limit, may help anyone
>> who hits the issue.

what do you think about this?  it's the simplest option; something like

pr_warn_ratelimited("xfrm4_gc_limit exceeded\n");

or with a suggestion...

pr_warn_ratelimited("xfrm4_gc_limit exceeded, you may want to increase
to %d or more",
  2048 * num_online_cpus());

>>
>> I'm not sure if something more aggressive is appropriate, like
>> removing active entries during garbage collection.
>
> It would not make too much sense to push an active flow out of the
> fastpath just to add some other flow. If the number of active
> entries is to high, there is no other option than increasing the
> gc threshold.
>
> You could try to reduce the number of active entries by shutting
> down stale security associations frequently.
>
>> Or, removing the
>> failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
>> always be allocated,
>
> This would open doors for DOS attacks, we can't do this.
>
>> or just increasing it from gc_thresh * 2 up to *
>> 4 or more.
>
> This would just defer the problem, so not a real solution.
>
> That said, whatever we do, we just paper over the real problem,
> that is the flowcache itself. Everything that need this kind
> of garbage collecting is fundamentally broken. But as long as
> nobody volunteers to work on a replacement, we have to live
> with this situation somehow.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: xfrm4_garbage_collect reaching limit

Reply via email to