On Fri, Apr 24, 2020 at 3:06 AM Eelco Chaudron <echau...@redhat.com> wrote:
>
>
>
> On 23 Apr 2020, at 17:32, William Tu wrote:
>
> > On Thu, Apr 23, 2020 at 6:29 AM Eelco Chaudron <echau...@redhat.com>
> > wrote:
> >>
> >> Hi Ben et al.
> >>
> >> We recently had an issue where OVS would crash as it was running out
> >> of
> >> stack space processing an OVN flow loop :)  I was hoping it would
> >> jump
> >> out of the loop, but due to change, "790c5d269 ofproto-dpif: Do not
> >> count resubmit to later tables against limit." the resubmit loop can
> >> be
> >> up to 4K.
> >>
> >> When the clone action is used (and others) the stack size increases
> >> quite drastically, some tests showed that over 19M was needed to
> >> reach
> >> the 4K limit. Even a simple resubmit to resubmit jump back and forth
> >> till 4K is reached requires a 3.5M stack size.
> >>
> >> Some small changes, like doing malloc for mf_subvalue, and
> >> actset_stub
> >> in clone_xlate_actions() allowed the worst case to go from around 19M
> >> to
> >> 12M, but still, this is a lot of stack memory.
> >>
> >> One idea could be that on the last action in the list try to unwind
> >> the
> >> stack (recursion) to the previous nonfinal action and then continue
> >> processing this action. I'm not too familiar with the xlate code, but
> >> it
> >> looks quite complex already, so not sure if this is an option :) Also
> >> not sure if this gives us enough relief in all the OVN scenario as
> >> they
> >> use a lot of resubmits in a single action list.
> >>
> >> Another idea Dumitru had was to delay clone() execution until you get
> >> back to the root actionset. So when you hit a clone() action you
> >> store
> >> the state in the ctx, and then go over the list once you return (this
> >> could result in a growing list). But you do not end up processing the
> >> clones on the branch of the tree. The only problem is that this
> >> results
> >> in out of order processing of the action list, i.e.
> >> clone(resubmit(,5)),
> >> 2. Will first sent out the packet on 2 and then on the destination on
> >> the clone() action. I guess this is a blocking thing, as the OpenFlow
> >> specification specifies action lists should be executed in order.
> >>
> >> Any other ideas on the above or on how to optimize the stack usage?
> >>
> > Hi Eelco,
> > Can we just increase the stack space to a larger value?
> > Ex: setting ulimit -s to 32Mb
> > William
>
> You are right, I should have mentioned the problem statement why this
> might not be desired.
>
> Let's assume you have a system with 56 cores. In this case, you will get
> roughly 56 threads, all taking 32M, so 1.7G. To make it even worse we
> run OVS with the mlockall() option so all memory gets reserved and pined
> into memory...
>
I see, thanks!

> I know the number of cores can be tuned with
> n-revalidator-threads/n-handler-threads and the stack size with systemd
> (in our case). But it would be good the minimize the stack usage in
> general, so we can avoid all this setup specific tuning.
>
Agree, we should definitely think about how to minimize the stack.

For short term solution, can we setrlimit() to 32M RLIMIT_STACK for
handler-threads and mlock() only heap memory?
William
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to