10 Regression] Huge store forward stall due to vectorizer

rguenther at suse dot de Wed, 31 Jul 2019 02:44:11 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 31 Jul 2019, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579
> 
> --- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
> Transform second loop as 
> 
> diff --git a/loop.c b/loop.c
> index feea9ea..81a3ea6 100644
> --- a/loop.c
> +++ b/loop.c
> @@ -9,6 +9,6 @@ loop (int k, double x)
>    for (i=0;i<6;i++)
>      r[i] = x * a[i + k];
>    for (i=0;i<6;i++)
> -    t+=r[5-i];
> +    t+=r[i]; -------- using ascending order, align with former loop.
>    return t;
>  }
> }
> 
> Can avoid store forward stalls.
> 
> Before loop transform:
> 
> loop_avx256: 3710992
> loop       : 671995
> loop_avx128: 650882
> 
> After loop transform:
> 
> loop_avx256: 661386
> loop       : 652932
> loop_avx128: 568710

Since the loop is probably unrolled this would be a task for
reassociation which should try to make data dependences in a way
the scheduler can then order memory accesses in advancing order
without increasing register pressure (would also help using pre/post-inc
addressing modes on some targets).  Currently operand rank for
memory accesses is determined by looking at the rank of SSA uses
(which there may be none) only.

[Bug tree-optimization/90579] [8/9/10 Regression] Huge store forward stall due to vectorizer

Reply via email to