On Fri, Nov 22, 2019 at 2:40 PM Adhemerval Zanella
<adhemerval.zane...@linaro.org> wrote:
>
> Hi Arnd,
>
> I took a look on the stack usage issue in the kernel snippet you provided [1],
> and as you have noted the most impact indeed come from -ftree-ch optimization.
> It is enabled in all optimization levels besides -Os (since besides possible
> increasing the stack usage it also might increase code side).
>
> I am still fulling grasping what free-ch optimization does, but my 
> understanding
> so far is it tries to reorganize the loop for later loop optimization phases.
> More specifically, what it ends up doing on the specific snippet is create 
> extra
> stack variables for the internal membber access in the inner loop (which in 
> its
> turns increase stack usage).

Thanks a lot for taking a detailed look!

>
> This is also why adding the compiler barrier inhibits the optimization, since 
> it
> prevents the ftree-ch to optimize the internal loop reorganization and it is
> passed as is to later optimizations phases.
>
> It is also a generic pass that affects all architecture, albeit the resulting
> stack will depend on later passes. With GCC 9.2.1 I see the resulting stack
> usage using -fstack-usage along with -O2:
>
> arm                     632
> aarch64                 448
> powerpc                 912
> powerpc64le             560
> s390                    600
> s390x                   632
> i386                    1376
> x86_64                  784
>
> Also, -fconserve-stack does not really help with this pass since ftree-ch does
> not check the flag usage.  The fconserve-stack currently only seems to effect
> the inliner by setting both large-stack-frame and large-stack-frame-growth to
> some conservative values.
>
> The straightforward change I am checking is just to disable tree-ch 
> optimization
> if fconserve-stack is also enabled:
>
> diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
> index b894a7e0918..b14dd66257c 100644
> --- a/gcc/tree-ssa-loop-ch.c
> +++ b/gcc/tree-ssa-loop-ch.c
> @@ -291,7 +291,8 @@ public:
>    {}
>
>    /* opt_pass methods: */
> -  virtual bool gate (function *) { return flag_tree_ch != 0; }
> +  virtual bool gate (function *) { return flag_tree_ch != 0
> +                                         && flag_conserve_stack == 0; }
>
>    /* Initialize and finalize loop structures, copying headers inbetween.  */
>    virtual unsigned int execute (function *);

That assumes that ftree-ch generally results in higher stack usage,
which is something we would have to confirm first. I've done
similar checks before on other options, basically building a large
project like the kernel with -Wframe-larger-than=128 (or similar),
and then comparing the warning output with/without that flag.

That would tell us whether this is a systematic problem with
-ftree-ch (making your patch a good idea) or whether the example
code just hit a worst case that is otherwise rare, and turning off
-ftree-ch generally just leads to worse output but no lower stack
usage.

One suspicion I have is that this is related to not only having
a large struct, but also having lots of 64-bit members in that
struct and working on it on a 32-bit architecture.

        Arnd
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to