On Fri, Nov 22, 2019 at 2:40 PM Adhemerval Zanella <adhemerval.zane...@linaro.org> wrote: > > Hi Arnd, > > I took a look on the stack usage issue in the kernel snippet you provided [1], > and as you have noted the most impact indeed come from -ftree-ch optimization. > It is enabled in all optimization levels besides -Os (since besides possible > increasing the stack usage it also might increase code side). > > I am still fulling grasping what free-ch optimization does, but my > understanding > so far is it tries to reorganize the loop for later loop optimization phases. > More specifically, what it ends up doing on the specific snippet is create > extra > stack variables for the internal membber access in the inner loop (which in > its > turns increase stack usage).
Thanks a lot for taking a detailed look! > > This is also why adding the compiler barrier inhibits the optimization, since > it > prevents the ftree-ch to optimize the internal loop reorganization and it is > passed as is to later optimizations phases. > > It is also a generic pass that affects all architecture, albeit the resulting > stack will depend on later passes. With GCC 9.2.1 I see the resulting stack > usage using -fstack-usage along with -O2: > > arm 632 > aarch64 448 > powerpc 912 > powerpc64le 560 > s390 600 > s390x 632 > i386 1376 > x86_64 784 > > Also, -fconserve-stack does not really help with this pass since ftree-ch does > not check the flag usage. The fconserve-stack currently only seems to effect > the inliner by setting both large-stack-frame and large-stack-frame-growth to > some conservative values. > > The straightforward change I am checking is just to disable tree-ch > optimization > if fconserve-stack is also enabled: > > diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c > index b894a7e0918..b14dd66257c 100644 > --- a/gcc/tree-ssa-loop-ch.c > +++ b/gcc/tree-ssa-loop-ch.c > @@ -291,7 +291,8 @@ public: > {} > > /* opt_pass methods: */ > - virtual bool gate (function *) { return flag_tree_ch != 0; } > + virtual bool gate (function *) { return flag_tree_ch != 0 > + && flag_conserve_stack == 0; } > > /* Initialize and finalize loop structures, copying headers inbetween. */ > virtual unsigned int execute (function *); That assumes that ftree-ch generally results in higher stack usage, which is something we would have to confirm first. I've done similar checks before on other options, basically building a large project like the kernel with -Wframe-larger-than=128 (or similar), and then comparing the warning output with/without that flag. That would tell us whether this is a systematic problem with -ftree-ch (making your patch a good idea) or whether the example code just hit a worst case that is otherwise rare, and turning off -ftree-ch generally just leads to worse output but no lower stack usage. One suspicion I have is that this is related to not only having a large struct, but also having lots of 64-bit members in that struct and working on it on a 32-bit architecture. Arnd _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain