Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-06-06 Thread Manolis Tsamis
On Fri, May 24, 2024 at 9:27 AM Richard Biener  wrote:
>
> On Thu, 23 May 2024, Manolis Tsamis wrote:
>
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >  strbw2, [x1, 1]
> >  ldr x0, [x1]  # Epxensive store forwarding to larger load.
> >
> > To:
> >
> >  ldr x0, [x1]
> >  strbw2, [x1]
> >  bfi x0, x2, 0, 8
>
> How do we represent atomics?  If the latter is a load-acquire or release
> the transform would be invalid.
>
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >   Neoverse-N1:  +29.4%
> >   Intel Coffeelake: +13.1%
> >   AMD 5950X:+17.5%
> >
> >   PR rtl-optimization/48696
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in: Add avoid-store-forwarding.o.
> >   * common.opt: New option -favoid-store-forwarding.
> >   * params.opt: New param store-forwarding-max-distance.
> >   * passes.def: Schedule a new pass.
> >   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> >   * avoid-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/avoid-store-forwarding-1.c: New test.
> >   * gcc.dg/avoid-store-forwarding-2.c: New test.
> >   * gcc.dg/avoid-store-forwarding-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/Makefile.in   |   1 +
> >  gcc/avoid-store-forwarding.cc | 554 ++
> >  gcc/common.opt|   4 +
> >  gcc/params.opt|   4 +
> >  gcc/passes.def|   1 +
> >  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
> >  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
> >  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
> >  gcc/tree-pass.h   |   1 +
> >  9 files changed, 681 insertions(+)
> >  create mode 100644 gcc/avoid-store-forwarding.cc
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index a7f15694c34..be969b1ca1d 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1681,6 +1681,7 @@ OBJS = \
> >   statistics.o \
> >   stmt.o \
> >   stor-layout.o \
> > + avoid-store-forwarding.o \
> >   store-motion.o \
> >   streamer-hooks.o \
> >   stringpool.o \
> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> > new file mode 100644
> > index 000..d90627c4872
> > --- /dev/null
> > +++ b/gcc/avoid-store-forwarding.cc
> > @@ -0,0 +1,554 @@
> > +/* Avoid store forwarding optimization pass.
> > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > +   Contributed by VRULL GmbH.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   You should have received a copy of the GNU General Public License
> > +   along with GCC; see the file COPYING3.  If not see
> > +   .  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "rtl.h"
> > +#include "alias.h"
> > +#include "rtlanal.h"
> > +#include "tree-pass.h"
> > +#include "cselib.h"
> > +#include "predict.h"
> > +#include "insn-config.h"
> > +#include "expmed.h"
> > +#include "recog.h"
> > +#include "regset.h"
> > +#include "df.h"
> > +#include "expr.h"
> > +#include "memmodel.h"
> > +#include "emit-rtl.h"
> > +#include "vec.h"
> > +
> > +/* This pass tries to detect and avoid cases of store forwarding.
> > +   On many processors there is a large penalty when smaller stores are
> > +   forwarded to larger loads.  The idea used to avoid the stall is to move
> > +   the store after the load and in addition emit a bit insert sequence so
> > +   the load register has the correct value.  For example the following:
> > +
> > + strbw2, [x1, 1]
> > + ldr x0, [x1]
> > +
> > +   Will be transformed to:
> > +
> > + ldr x0, [x1]
> > + and w2, w2, 255
> > + strbw2, [x1]
> > + bfi x0

Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-05-30 Thread Manolis Tsamis
On Fri, May 24, 2024 at 9:27 AM Richard Biener  wrote:
>
> On Thu, 23 May 2024, Manolis Tsamis wrote:
>
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >  strbw2, [x1, 1]
> >  ldr x0, [x1]  # Epxensive store forwarding to larger load.
> >
> > To:
> >
> >  ldr x0, [x1]
> >  strbw2, [x1]
> >  bfi x0, x2, 0, 8
>
> How do we represent atomics?  If the latter is a load-acquire or release
> the transform would be invalid.
>
As you noted, this transformation cannot work with acquire/release, so
when the pass finds a volatile_refs_p instruction it drops all
candidates so there is no correctness issue.

> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >   Neoverse-N1:  +29.4%
> >   Intel Coffeelake: +13.1%
> >   AMD 5950X:+17.5%
> >
> >   PR rtl-optimization/48696
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in: Add avoid-store-forwarding.o.
> >   * common.opt: New option -favoid-store-forwarding.
> >   * params.opt: New param store-forwarding-max-distance.
> >   * passes.def: Schedule a new pass.
> >   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> >   * avoid-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/avoid-store-forwarding-1.c: New test.
> >   * gcc.dg/avoid-store-forwarding-2.c: New test.
> >   * gcc.dg/avoid-store-forwarding-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/Makefile.in   |   1 +
> >  gcc/avoid-store-forwarding.cc | 554 ++
> >  gcc/common.opt|   4 +
> >  gcc/params.opt|   4 +
> >  gcc/passes.def|   1 +
> >  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
> >  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
> >  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
> >  gcc/tree-pass.h   |   1 +
> >  9 files changed, 681 insertions(+)
> >  create mode 100644 gcc/avoid-store-forwarding.cc
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index a7f15694c34..be969b1ca1d 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1681,6 +1681,7 @@ OBJS = \
> >   statistics.o \
> >   stmt.o \
> >   stor-layout.o \
> > + avoid-store-forwarding.o \
> >   store-motion.o \
> >   streamer-hooks.o \
> >   stringpool.o \
> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> > new file mode 100644
> > index 000..d90627c4872
> > --- /dev/null
> > +++ b/gcc/avoid-store-forwarding.cc
> > @@ -0,0 +1,554 @@
> > +/* Avoid store forwarding optimization pass.
> > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > +   Contributed by VRULL GmbH.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   You should have received a copy of the GNU General Public License
> > +   along with GCC; see the file COPYING3.  If not see
> > +   .  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "rtl.h"
> > +#include "alias.h"
> > +#include "rtlanal.h"
> > +#include "tree-pass.h"
> > +#include "cselib.h"
> > +#include "predict.h"
> > +#include "insn-config.h"
> > +#include "expmed.h"
> > +#include "recog.h"
> > +#include "regset.h"
> > +#include "df.h"
> > +#include "expr.h"
> > +#include "memmodel.h"
> > +#include "emit-rtl.h"
> > +#include "vec.h"
> > +
> > +/* This pass tries to detect and avoid cases of store forwarding.
> > +   On many processors there is a large penalty when smaller stores are
> > +   forwarded to larger loads.  The idea used to avoid the stall is to move
> > +   the store after the load and in addition emit a bit insert sequence so
> > +   the load register has the correct value.  For example the following:
> > +
> > + strbw2, [x1

Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-05-23 Thread Richard Biener
On Thu, 23 May 2024, Manolis Tsamis wrote:

> This pass detects cases of expensive store forwarding and tries to avoid them
> by reordering the stores and using suitable bit insertion sequences.
> For example it can transform this:
> 
>  strbw2, [x1, 1]
>  ldr x0, [x1]  # Epxensive store forwarding to larger load.
> 
> To:
> 
>  ldr x0, [x1]
>  strbw2, [x1]
>  bfi x0, x2, 0, 8

How do we represent atomics?  If the latter is a load-acquire or release
the transform would be invalid.

> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
> 
>   Neoverse-N1:  +29.4%
>   Intel Coffeelake: +13.1%
>   AMD 5950X:+17.5%
> 
>   PR rtl-optimization/48696
> 
> gcc/ChangeLog:
> 
>   * Makefile.in: Add avoid-store-forwarding.o.
>   * common.opt: New option -favoid-store-forwarding.
>   * params.opt: New param store-forwarding-max-distance.
>   * passes.def: Schedule a new pass.
>   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
>   * avoid-store-forwarding.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/avoid-store-forwarding-1.c: New test.
>   * gcc.dg/avoid-store-forwarding-2.c: New test.
>   * gcc.dg/avoid-store-forwarding-3.c: New test.
> 
> Signed-off-by: Manolis Tsamis 
> ---
> 
>  gcc/Makefile.in   |   1 +
>  gcc/avoid-store-forwarding.cc | 554 ++
>  gcc/common.opt|   4 +
>  gcc/params.opt|   4 +
>  gcc/passes.def|   1 +
>  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
>  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
>  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
>  gcc/tree-pass.h   |   1 +
>  9 files changed, 681 insertions(+)
>  create mode 100644 gcc/avoid-store-forwarding.cc
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a7f15694c34..be969b1ca1d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1681,6 +1681,7 @@ OBJS = \
>   statistics.o \
>   stmt.o \
>   stor-layout.o \
> + avoid-store-forwarding.o \
>   store-motion.o \
>   streamer-hooks.o \
>   stringpool.o \
> diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> new file mode 100644
> index 000..d90627c4872
> --- /dev/null
> +++ b/gcc/avoid-store-forwarding.cc
> @@ -0,0 +1,554 @@
> +/* Avoid store forwarding optimization pass.
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +   Contributed by VRULL GmbH.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "alias.h"
> +#include "rtlanal.h"
> +#include "tree-pass.h"
> +#include "cselib.h"
> +#include "predict.h"
> +#include "insn-config.h"
> +#include "expmed.h"
> +#include "recog.h"
> +#include "regset.h"
> +#include "df.h"
> +#include "expr.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "vec.h"
> +
> +/* This pass tries to detect and avoid cases of store forwarding.
> +   On many processors there is a large penalty when smaller stores are
> +   forwarded to larger loads.  The idea used to avoid the stall is to move
> +   the store after the load and in addition emit a bit insert sequence so
> +   the load register has the correct value.  For example the following:
> +
> + strbw2, [x1, 1]
> + ldr x0, [x1]
> +
> +   Will be transformed to:
> +
> + ldr x0, [x1]
> + and w2, w2, 255
> + strbw2, [x1]
> + bfi x0, x2, 0, 8
> +*/
> +
> +namespace {
> +
> +const pass_data pass_data_avoid_store_forwarding =
> +{
> +  RTL_PASS, /* type.  */
> +  "avoid_store_forwarding", /* name.  */
> +  OPTGROUP_NONE, /* optinfo_flags.  */
> +  TV_NONE, /* tv_id.  */
> +  0, /* properties_required.  */
> +  0, /* properties_provided.  */
> +  0,

Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-05-23 Thread Philipp Tomsich
On Thu, 23 May 2024 at 18:18, Andrew Pinski  wrote:
>
> On Thu, May 23, 2024 at 8:01 AM Manolis Tsamis  
> wrote:
> >
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >  strbw2, [x1, 1]
> >  ldr x0, [x1]  # Epxensive store forwarding to larger load.

@Manolis: looks like a typo slipped through: Epxensive -> Expensive

> >
> > To:
> >
> >  ldr x0, [x1]
> >  strbw2, [x1]
> >  bfi x0, x2, 0, 8
> >
>
> Are you sure this is correct with respect to the C11/C++11 memory
> models? If not then the pass should be gated with
> flag_store_data_races.

This optimization (i.e., the reordering and usage of the
bfi-instruction) should always be safe and not violate the C++11
memory model, as we still perform the same stores (i.e., with the same
width).
Keeping the same stores around (and only reordering them relative to
the loads) ensures that only the bytes containing the adjacent bits
are overwritten.
This pass never tries to merge multiple stores (although later passes
may), but only reorders those relative to a (wider) load we are
forwarding into.

> Also stores like this start a new "alias set" (I can't remember the
> exact term here). So how do you represent the store's aliasing set? Do
> you change it? If not, are you sure that will do the right thing?
>
> You didn't document the new option or the new --param (invoke.texi);
> this is the bare minimum requirement.
> Note you should add documentation for the new pass in the internals
> manual (passes.texi) (note most folks forget to update this when
> adding a new pass).
>
> Thanks,
> Andrew
>
>
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >   Neoverse-N1:  +29.4%
> >   Intel Coffeelake: +13.1%
> >   AMD 5950X:+17.5%
> >
> > PR rtl-optimization/48696
> >
> > gcc/ChangeLog:
> >
> > * Makefile.in: Add avoid-store-forwarding.o.
> > * common.opt: New option -favoid-store-forwarding.
> > * params.opt: New param store-forwarding-max-distance.
> > * passes.def: Schedule a new pass.
> > * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> > * avoid-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/avoid-store-forwarding-1.c: New test.
> > * gcc.dg/avoid-store-forwarding-2.c: New test.
> > * gcc.dg/avoid-store-forwarding-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/Makefile.in   |   1 +
> >  gcc/avoid-store-forwarding.cc | 554 ++
> >  gcc/common.opt|   4 +
> >  gcc/params.opt|   4 +
> >  gcc/passes.def|   1 +
> >  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
> >  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
> >  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
> >  gcc/tree-pass.h   |   1 +
> >  9 files changed, 681 insertions(+)
> >  create mode 100644 gcc/avoid-store-forwarding.cc
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index a7f15694c34..be969b1ca1d 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1681,6 +1681,7 @@ OBJS = \
> > statistics.o \
> > stmt.o \
> > stor-layout.o \
> > +   avoid-store-forwarding.o \
> > store-motion.o \
> > streamer-hooks.o \
> > stringpool.o \
> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> > new file mode 100644
> > index 000..d90627c4872
> > --- /dev/null
> > +++ b/gcc/avoid-store-forwarding.cc
> > @@ -0,0 +1,554 @@
> > +/* Avoid store forwarding optimization pass.
> > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > +   Contributed by VRULL GmbH.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   You should have received a copy of the GNU General Public License
> > +   along wi

Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-05-23 Thread Andrew Pinski
On Thu, May 23, 2024 at 8:01 AM Manolis Tsamis  wrote:
>
> This pass detects cases of expensive store forwarding and tries to avoid them
> by reordering the stores and using suitable bit insertion sequences.
> For example it can transform this:
>
>  strbw2, [x1, 1]
>  ldr x0, [x1]  # Epxensive store forwarding to larger load.
>
> To:
>
>  ldr x0, [x1]
>  strbw2, [x1]
>  bfi x0, x2, 0, 8
>

Are you sure this is correct with respect to the C11/C++11 memory
models? If not then the pass should be gated with
flag_store_data_races.
Also stores like this start a new "alias set" (I can't remember the
exact term here). So how do you represent the store's aliasing set? Do
you change it? If not, are you sure that will do the right thing?

You didn't document the new option or the new --param (invoke.texi);
this is the bare minimum requirement.
Note you should add documentation for the new pass in the internals
manual (passes.texi) (note most folks forget to update this when
adding a new pass).

Thanks,
Andrew


> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
>
>   Neoverse-N1:  +29.4%
>   Intel Coffeelake: +13.1%
>   AMD 5950X:+17.5%
>
> PR rtl-optimization/48696
>
> gcc/ChangeLog:
>
> * Makefile.in: Add avoid-store-forwarding.o.
> * common.opt: New option -favoid-store-forwarding.
> * params.opt: New param store-forwarding-max-distance.
> * passes.def: Schedule a new pass.
> * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> * avoid-store-forwarding.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/avoid-store-forwarding-1.c: New test.
> * gcc.dg/avoid-store-forwarding-2.c: New test.
> * gcc.dg/avoid-store-forwarding-3.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
>  gcc/Makefile.in   |   1 +
>  gcc/avoid-store-forwarding.cc | 554 ++
>  gcc/common.opt|   4 +
>  gcc/params.opt|   4 +
>  gcc/passes.def|   1 +
>  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
>  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
>  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
>  gcc/tree-pass.h   |   1 +
>  9 files changed, 681 insertions(+)
>  create mode 100644 gcc/avoid-store-forwarding.cc
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a7f15694c34..be969b1ca1d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1681,6 +1681,7 @@ OBJS = \
> statistics.o \
> stmt.o \
> stor-layout.o \
> +   avoid-store-forwarding.o \
> store-motion.o \
> streamer-hooks.o \
> stringpool.o \
> diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> new file mode 100644
> index 000..d90627c4872
> --- /dev/null
> +++ b/gcc/avoid-store-forwarding.cc
> @@ -0,0 +1,554 @@
> +/* Avoid store forwarding optimization pass.
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +   Contributed by VRULL GmbH.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "alias.h"
> +#include "rtlanal.h"
> +#include "tree-pass.h"
> +#include "cselib.h"
> +#include "predict.h"
> +#include "insn-config.h"
> +#include "expmed.h"
> +#include "recog.h"
> +#include "regset.h"
> +#include "df.h"
> +#include "expr.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "vec.h"
> +
> +/* This pass tries to detect and avoid cases of store forwarding.
> +   On many processors there is a large penalty when smaller stores are
> +   forwarded to larger loads.  The idea used to avoid the stall is to move
> +   the store after the load and in addition emit a bit insert sequence so
> +   the load register has the