On Fri, Apr 24, 2020 at 3:06 AM Eelco Chaudron <echau...@redhat.com> wrote: > > > > On 23 Apr 2020, at 17:32, William Tu wrote: > > > On Thu, Apr 23, 2020 at 6:29 AM Eelco Chaudron <echau...@redhat.com> > > wrote: > >> > >> Hi Ben et al. > >> > >> We recently had an issue where OVS would crash as it was running out > >> of > >> stack space processing an OVN flow loop :) I was hoping it would > >> jump > >> out of the loop, but due to change, "790c5d269 ofproto-dpif: Do not > >> count resubmit to later tables against limit." the resubmit loop can > >> be > >> up to 4K. > >> > >> When the clone action is used (and others) the stack size increases > >> quite drastically, some tests showed that over 19M was needed to > >> reach > >> the 4K limit. Even a simple resubmit to resubmit jump back and forth > >> till 4K is reached requires a 3.5M stack size. > >> > >> Some small changes, like doing malloc for mf_subvalue, and > >> actset_stub > >> in clone_xlate_actions() allowed the worst case to go from around 19M > >> to > >> 12M, but still, this is a lot of stack memory. > >> > >> One idea could be that on the last action in the list try to unwind > >> the > >> stack (recursion) to the previous nonfinal action and then continue > >> processing this action. I'm not too familiar with the xlate code, but > >> it > >> looks quite complex already, so not sure if this is an option :) Also > >> not sure if this gives us enough relief in all the OVN scenario as > >> they > >> use a lot of resubmits in a single action list. > >> > >> Another idea Dumitru had was to delay clone() execution until you get > >> back to the root actionset. So when you hit a clone() action you > >> store > >> the state in the ctx, and then go over the list once you return (this > >> could result in a growing list). But you do not end up processing the > >> clones on the branch of the tree. The only problem is that this > >> results > >> in out of order processing of the action list, i.e. > >> clone(resubmit(,5)), > >> 2. Will first sent out the packet on 2 and then on the destination on > >> the clone() action. I guess this is a blocking thing, as the OpenFlow > >> specification specifies action lists should be executed in order. > >> > >> Any other ideas on the above or on how to optimize the stack usage? > >> > > Hi Eelco, > > Can we just increase the stack space to a larger value? > > Ex: setting ulimit -s to 32Mb > > William > > You are right, I should have mentioned the problem statement why this > might not be desired. > > Let's assume you have a system with 56 cores. In this case, you will get > roughly 56 threads, all taking 32M, so 1.7G. To make it even worse we > run OVS with the mlockall() option so all memory gets reserved and pined > into memory... > I see, thanks!
> I know the number of cores can be tuned with > n-revalidator-threads/n-handler-threads and the stack size with systemd > (in our case). But it would be good the minimize the stack usage in > general, so we can avoid all this setup specific tuning. > Agree, we should definitely think about how to minimize the stack. For short term solution, can we setrlimit() to 32M RLIMIT_STACK for handler-threads and mlock() only heap memory? William _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev