Re: [PATCH v3] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-09-21 Thread Noah Goldstein via Gcc-patches
On Sat, Jul 9, 2022 at 8:59 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/21/2022 12:12 PM, Noah Goldstein via Gcc-patches wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> >   PR tree-optimization/95821
> >
> > gcc/
> >
> >   * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> >   memchr instead of strchr if strlen already computed.
> >
> > gcc/testsuite/
> >
> >   * c-c++-common/pr95821-1.c: New test.
> >   * c-c++-common/pr95821-2.c: New test.
> >   * c-c++-common/pr95821-3.c: New test.
> >   * c-c++-common/pr95821-4.c: New test.
> >   * c-c++-common/pr95821-5.c: New test.
> >   * c-c++-common/pr95821-6.c: New test.
> >   * c-c++-common/pr95821-7.c: New test.
> >   * c-c++-common/pr95821-8.c: New test.
> Given Jakub's involvement to-date and the fact this touches
> tree-ssa-strlen.cc I think Jakub should have final ACK/NAK on this.
>
> jeff
>

Ping.


Re: [PATCH v2] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-07-07 Thread Noah Goldstein via Gcc-patches
On Tue, Jun 21, 2022 at 11:13 AM Noah Goldstein  wrote:
>
> On Tue, Jun 21, 2022 at 5:01 AM Jakub Jelinek  wrote:
> >
> > On Mon, Jun 20, 2022 at 02:42:20PM -0700, Noah Goldstein wrote:
> > > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > > tree.
> > >
> > > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> > >
> > > Since memchr doesn't need to re-find the null terminator it is faster
> > > than strchr.
> > >
> > > bootstrapped and tested on x86_64-linux.
> > >
> > >   PR tree-optimization/95821
> >
> > This should be indented by a single tab, not two.
>
> Fixed in V3
> > >
> > > gcc/
> > >
> > >   * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> > >   memchr instead of strchr if strlen already computed.
> > >
> > > gcc/testsuite/
> > >
> > >   * c-c++-common/pr95821-1.c: New test.
> > >   * c-c++-common/pr95821-2.c: New test.
> > >   * c-c++-common/pr95821-3.c: New test.
> > >   * c-c++-common/pr95821-4.c: New test.
> > >   * c-c++-common/pr95821-5.c: New test.
> > >   * c-c++-common/pr95821-6.c: New test.
> > >   * c-c++-common/pr95821-7.c: New test.
> > >   * c-c++-common/pr95821-8.c: New test.
> > > --- a/gcc/tree-ssa-strlen.cc
> > > +++ b/gcc/tree-ssa-strlen.cc
> > > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
> > >  }
> > >  }
> > >
> > > -/* Handle a strchr call.  If strlen of the first argument is known, 
> > > replace
> > > -   the strchr (x, 0) call with the endptr or x + strlen, otherwise 
> > > remember
> > > -   that lhs of the call is endptr and strlen of the argument is endptr - 
> > > x.  */
> > > +/* Handle a strchr call.  If strlen of the first argument is known,
> > > +   replace the strchr (x, 0) call with the endptr or x + strlen,
> > > +   otherwise remember that lhs of the call is endptr and strlen of the
> > > +   argument is endptr - x.  If strlen of x is not know but has been
> > > +   computed earlier in the tree then replace strchr(x, c) to
> >
> > Still missing space before ( above.
>
> Sorry, fixed that in V3.
> >
> > > +   memchr (x, c, strlen + 1).  */
> > >
> > >  void
> > >  strlen_pass::handle_builtin_strchr ()
> > > @@ -2418,8 +2421,12 @@ strlen_pass::handle_builtin_strchr ()
> > >if (lhs == NULL_TREE)
> > >  return;
> > >
> > > -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> > > -return;
> > > +  tree chr = gimple_call_arg (stmt, 1);
> > > +  /* strchr only uses the lower char of input so to check if its
> > > + strchr (s, zerop) only take into account the lower char.  */
> > > +  bool is_strchr_zerop
> > > +  = (TREE_CODE (chr) == INTEGER_CST
> > > +  && integer_zerop (fold_convert (char_type_node, chr)));
> >
> > The indentation rule is that = should be 2 columns to the right from bool,
> > so
> >
>
> Fixed in V3.
> >   bool is_strchr_zerop
> > = (TREE_CODE (chr) == INTEGER_CST
> >&& integer_zerop (fold_convert (char_type_node, chr)));
> >
> > > +   /* If its not strchr (s, zerop) then try and convert to
> > > +  memchr since strlen has already been computed.  */
> >
> > This comment still has the second line weirdly indented.
>
> Sorry, have emacs with 4-space tabs so things that look right arent
> as they seem :/
>
> Fixed in V3 I believe.
> >
> > > +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> > > +
> > > +   /* Only need to check length strlen (s) + 1 if chr may be 
> > > zero.
> > > + Otherwise the last chr (which is known to be zero) can never
> > > + be a match.  NB: We don't need to test if chr is a non-zero
> > > + integer const with zero char bits because that is taken into
> > > + account with is_strchr_zerop.  */
> > > +   if (!tree_expr_nonzero_p (chr))
> >
> > The above is unsafe though.  tree_expr_nonzero_p (chr) will return true
> > if say VRP can prove it is not zero, but because of the implicit
> > (char) chr cast done by the function we need something different.
> > Say if VRP determines that chr is in [1, INT_MAX] or even just [255, 257]
> > it doesn't mean (char) chr won't be 0.
> > So, as I've tried to explain in the previous mail, it can be done e.g. with
>
> Added your code in V3. Thanks for the help.
> >   bool chr_nonzero = false;
> >   if (TREE_CODE (chr) == INTEGER_CST
> >   && integer_nonzerop (fold_convert (char_type_node, chr)))
> > chr_nonzero = true;
> >   else if (TREE_CODE (chr) == SSA_NAME
> >&& CHAR_TYPE_SIZE < INT_TYPE_SIZE)
> > {
> >   value_range r;
> >   /* Try to determine using ranges if (char) chr must
> >  be always 0.  That is true e.g. if all the subranges
> >  have the INT_TYPE_SIZE - 

Re: [PATCH v2] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-21 Thread Noah Goldstein via Gcc-patches
On Tue, Jun 21, 2022 at 5:01 AM Jakub Jelinek  wrote:
>
> On Mon, Jun 20, 2022 at 02:42:20PM -0700, Noah Goldstein wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> >   PR tree-optimization/95821
>
> This should be indented by a single tab, not two.

Fixed in V3
> >
> > gcc/
> >
> >   * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> >   memchr instead of strchr if strlen already computed.
> >
> > gcc/testsuite/
> >
> >   * c-c++-common/pr95821-1.c: New test.
> >   * c-c++-common/pr95821-2.c: New test.
> >   * c-c++-common/pr95821-3.c: New test.
> >   * c-c++-common/pr95821-4.c: New test.
> >   * c-c++-common/pr95821-5.c: New test.
> >   * c-c++-common/pr95821-6.c: New test.
> >   * c-c++-common/pr95821-7.c: New test.
> >   * c-c++-common/pr95821-8.c: New test.
> > --- a/gcc/tree-ssa-strlen.cc
> > +++ b/gcc/tree-ssa-strlen.cc
> > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
> >  }
> >  }
> >
> > -/* Handle a strchr call.  If strlen of the first argument is known, replace
> > -   the strchr (x, 0) call with the endptr or x + strlen, otherwise remember
> > -   that lhs of the call is endptr and strlen of the argument is endptr - 
> > x.  */
> > +/* Handle a strchr call.  If strlen of the first argument is known,
> > +   replace the strchr (x, 0) call with the endptr or x + strlen,
> > +   otherwise remember that lhs of the call is endptr and strlen of the
> > +   argument is endptr - x.  If strlen of x is not know but has been
> > +   computed earlier in the tree then replace strchr(x, c) to
>
> Still missing space before ( above.

Sorry, fixed that in V3.
>
> > +   memchr (x, c, strlen + 1).  */
> >
> >  void
> >  strlen_pass::handle_builtin_strchr ()
> > @@ -2418,8 +2421,12 @@ strlen_pass::handle_builtin_strchr ()
> >if (lhs == NULL_TREE)
> >  return;
> >
> > -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> > -return;
> > +  tree chr = gimple_call_arg (stmt, 1);
> > +  /* strchr only uses the lower char of input so to check if its
> > + strchr (s, zerop) only take into account the lower char.  */
> > +  bool is_strchr_zerop
> > +  = (TREE_CODE (chr) == INTEGER_CST
> > +  && integer_zerop (fold_convert (char_type_node, chr)));
>
> The indentation rule is that = should be 2 columns to the right from bool,
> so
>

Fixed in V3.
>   bool is_strchr_zerop
> = (TREE_CODE (chr) == INTEGER_CST
>&& integer_zerop (fold_convert (char_type_node, chr)));
>
> > +   /* If its not strchr (s, zerop) then try and convert to
> > +  memchr since strlen has already been computed.  */
>
> This comment still has the second line weirdly indented.

Sorry, have emacs with 4-space tabs so things that look right arent
as they seem :/

Fixed in V3 I believe.
>
> > +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> > +
> > +   /* Only need to check length strlen (s) + 1 if chr may be zero.
> > + Otherwise the last chr (which is known to be zero) can never
> > + be a match.  NB: We don't need to test if chr is a non-zero
> > + integer const with zero char bits because that is taken into
> > + account with is_strchr_zerop.  */
> > +   if (!tree_expr_nonzero_p (chr))
>
> The above is unsafe though.  tree_expr_nonzero_p (chr) will return true
> if say VRP can prove it is not zero, but because of the implicit
> (char) chr cast done by the function we need something different.
> Say if VRP determines that chr is in [1, INT_MAX] or even just [255, 257]
> it doesn't mean (char) chr won't be 0.
> So, as I've tried to explain in the previous mail, it can be done e.g. with

Added your code in V3. Thanks for the help.
>   bool chr_nonzero = false;
>   if (TREE_CODE (chr) == INTEGER_CST
>   && integer_nonzerop (fold_convert (char_type_node, chr)))
> chr_nonzero = true;
>   else if (TREE_CODE (chr) == SSA_NAME
>&& CHAR_TYPE_SIZE < INT_TYPE_SIZE)
> {
>   value_range r;
>   /* Try to determine using ranges if (char) chr must
>  be always 0.  That is true e.g. if all the subranges
>  have the INT_TYPE_SIZE - CHAR_TYPE_SIZE bits
>  the same on lower and upper bounds.  */
>   if (get_range_query (cfun)->range_of_expr (r, chr, stmt)
>   && r.kind () == VR_RANGE)
> {
>   chr_nonzero = true;
>   wide_int mask = 

[PATCH v3] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-21 Thread Noah Goldstein via Gcc-patches
This patch allows for strchr(x, c) to the replace with memchr(x, c,
strlen(x) + 1) if strlen(x) has already been computed earlier in the
tree.

Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821

Since memchr doesn't need to re-find the null terminator it is faster
than strchr.

bootstrapped and tested on x86_64-linux.

PR tree-optimization/95821

gcc/

* tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
memchr instead of strchr if strlen already computed.

gcc/testsuite/

* c-c++-common/pr95821-1.c: New test.
* c-c++-common/pr95821-2.c: New test.
* c-c++-common/pr95821-3.c: New test.
* c-c++-common/pr95821-4.c: New test.
* c-c++-common/pr95821-5.c: New test.
* c-c++-common/pr95821-6.c: New test.
* c-c++-common/pr95821-7.c: New test.
* c-c++-common/pr95821-8.c: New test.
---
 gcc/testsuite/c-c++-common/pr95821-1.c |  15 
 gcc/testsuite/c-c++-common/pr95821-2.c |  17 
 gcc/testsuite/c-c++-common/pr95821-3.c |  17 
 gcc/testsuite/c-c++-common/pr95821-4.c |  16 
 gcc/testsuite/c-c++-common/pr95821-5.c |  19 +
 gcc/testsuite/c-c++-common/pr95821-6.c |  18 
 gcc/testsuite/c-c++-common/pr95821-7.c |  18 
 gcc/testsuite/c-c++-common/pr95821-8.c |  19 +
 gcc/tree-ssa-strlen.cc | 113 -
 9 files changed, 233 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-3.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-4.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-5.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-6.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-7.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-8.c

diff --git a/gcc/testsuite/c-c++-common/pr95821-1.c 
b/gcc/testsuite/c-c++-common/pr95821-1.c
new file mode 100644
index 000..e0beb609ea2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-2.c 
b/gcc/testsuite/c-c++-common/pr95821-2.c
new file mode 100644
index 000..5429f0586be
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-3.c 
b/gcc/testsuite/c-c++-common/pr95821-3.c
new file mode 100644
index 000..bc929c6044b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-4.c 
b/gcc/testsuite/c-c++-common/pr95821-4.c
new file mode 100644
index 000..684b41d5b70
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-5.c 
b/gcc/testsuite/c-c++-common/pr95821-5.c
new file mode 100644
index 000..00c1d93b614
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return strchr(s, c);
+}
+int main() {}
diff --git a/gcc/testsuite/c-c++-common/pr95821-6.c 
b/gcc/testsuite/c-c++-common/pr95821-6.c
new file mode 100644
index 000..dec839de5ea
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char * __restrict s, char c, 

[PATCH v2] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
This patch allows for strchr(x, c) to the replace with memchr(x, c,
strlen(x) + 1) if strlen(x) has already been computed earlier in the
tree.

Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821

Since memchr doesn't need to re-find the null terminator it is faster
than strchr.

bootstrapped and tested on x86_64-linux.

PR tree-optimization/95821

gcc/

* tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
memchr instead of strchr if strlen already computed.

gcc/testsuite/

* c-c++-common/pr95821-1.c: New test.
* c-c++-common/pr95821-2.c: New test.
* c-c++-common/pr95821-3.c: New test.
* c-c++-common/pr95821-4.c: New test.
* c-c++-common/pr95821-5.c: New test.
* c-c++-common/pr95821-6.c: New test.
* c-c++-common/pr95821-7.c: New test.
* c-c++-common/pr95821-8.c: New test.
---
 gcc/testsuite/c-c++-common/pr95821-1.c | 15 +
 gcc/testsuite/c-c++-common/pr95821-2.c | 17 +
 gcc/testsuite/c-c++-common/pr95821-3.c | 17 +
 gcc/testsuite/c-c++-common/pr95821-4.c | 16 +
 gcc/testsuite/c-c++-common/pr95821-5.c | 19 ++
 gcc/testsuite/c-c++-common/pr95821-6.c | 18 ++
 gcc/testsuite/c-c++-common/pr95821-7.c | 18 ++
 gcc/testsuite/c-c++-common/pr95821-8.c | 19 ++
 gcc/tree-ssa-strlen.cc | 89 --
 9 files changed, 209 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-3.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-4.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-5.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-6.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-7.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-8.c

diff --git a/gcc/testsuite/c-c++-common/pr95821-1.c 
b/gcc/testsuite/c-c++-common/pr95821-1.c
new file mode 100644
index 000..e0beb609ea2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-2.c 
b/gcc/testsuite/c-c++-common/pr95821-2.c
new file mode 100644
index 000..5429f0586be
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-3.c 
b/gcc/testsuite/c-c++-common/pr95821-3.c
new file mode 100644
index 000..bc929c6044b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-4.c 
b/gcc/testsuite/c-c++-common/pr95821-4.c
new file mode 100644
index 000..684b41d5b70
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-5.c 
b/gcc/testsuite/c-c++-common/pr95821-5.c
new file mode 100644
index 000..00c1d93b614
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return strchr(s, c);
+}
+int main() {}
diff --git a/gcc/testsuite/c-c++-common/pr95821-6.c 
b/gcc/testsuite/c-c++-common/pr95821-6.c
new file mode 100644
index 000..dec839de5ea
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char * __restrict 

Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
On Mon, Jun 20, 2022 at 10:29 AM Jakub Jelinek  wrote:
>
> On Mon, Jun 20, 2022 at 09:35:36AM -0700, Noah Goldstein via Gcc-patches 
> wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
>
> Do you have a GCC Copyright assignment on file, or do you want to submit
> this under DCO ( https://gcc.gnu.org/dco.html )?  If the latter, there
> should be a Signed-off-by: line, both in the mail and later commit.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> > gcc/
> >
>
> As it fixes a GCC bugzilla bug, the ChangeLog entry should start with
> PR tree-optimization/95821
> line.
> > * tree-ssa-strlen.cc: Emit memchr instead of strchr if strlen
> >  already computed.
>
> All the indented lines in ChangeLog should be indented by tab.
> You are modifying strlen_pass::handle_builtin_strchr function, so after
> tree-ssa-strlen.cc there should be that function name in parens:
> * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> memchr ...

Fixed in v2.
>
> >
> > gcc/testsuite/
> >
> > * c-c++-common/pr95821-1.c
> > * c-c++-common/pr95821-2.c
> > * c-c++-common/pr95821-3.c
> > * c-c++-common/pr95821-4.c
> > * c-c++-common/pr95821-5.c
> > * c-c++-common/pr95821-6.c
>
> All the above lines should end with ": New test." after .c

Fixed in V2.
>
> > --- a/gcc/tree-ssa-strlen.cc
> > +++ b/gcc/tree-ssa-strlen.cc
>
> How does the patch relate to the one that H.J. attached in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821#c4 ?
>
> > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
> >  }
> >  }
> >
> > -/* Handle a strchr call.  If strlen of the first argument is known, replace
> > -   the strchr (x, 0) call with the endptr or x + strlen, otherwise remember
> > -   that lhs of the call is endptr and strlen of the argument is endptr - 
> > x.  */
> > +/* Handle a strchr call.  If strlen of the first argument is known,
> > +   replace the strchr (x, 0) call with the endptr or x + strlen,
> > +   otherwise remember that lhs of the call is endptr and strlen of the
> > +   argument is endptr - x.  If strlen of x is not know but has been
> > +   computed earlier in the tree then replace strchr(x, c) to
> > +   memchr(x, c, strlen + 1).  */
>
> Space before ( even in comments.

Fixed in V2.
>
>
>
> >  void
> >  strlen_pass::handle_builtin_strchr ()
> > @@ -2418,8 +2421,8 @@ strlen_pass::handle_builtin_strchr ()
> >if (lhs == NULL_TREE)
> >  return;
> >
> > -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> > -return;
> > +  tree chr = gimple_call_arg (stmt, 1);
> > +  bool is_strchr_zerop = integer_zerop (chr);
> >
> >tree src = gimple_call_arg (stmt, 0);
> >
> > @@ -2452,32 +2455,56 @@ strlen_pass::handle_builtin_strchr ()
> > fprintf (dump_file, "Optimizing: ");
> > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> >   }
> > -   if (si != NULL && si->endptr != NULL_TREE)
> > +   if (!is_strchr_zerop)
> >   {
> > -   rhs = unshare_expr (si->endptr);
> > -   if (!useless_type_conversion_p (TREE_TYPE (lhs),
> > -   TREE_TYPE (rhs)))
> > - rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> > +   /* If its not strchr(s, zerop) then try and convert to
> > +memchr if strlen has already been computed.  */
>
> Again, space before (.  The second line is weirdly formatted, should
> be indented below If.

Fixed in V2.
>
> > +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> > +   tree one = build_int_cst (TREE_TYPE (rhs), 1);
> > +   rhs = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (rhs),
> > +  unshare_expr (rhs), one);
> > +   tree size = make_ssa_name (TREE_TYPE (rhs));
> > +   gassign *size_stmt = gimple_build_assign (size, rhs);
> > +   gsi_insert_before (_gsi, size_stmt, GSI_SAME_STMT);
> > +   rhs = size;
> > +   if (!update_gimple_call (_gsi, fn, 3, src, chr, rhs))
> > + return;
>
> I think we shou

Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
On Mon, Jun 20, 2022 at 12:04 PM Jakub Jelinek  wrote:
>
> On Mon, Jun 20, 2022 at 11:48:24AM -0700, Noah Goldstein wrote:
> > > I think we should differentiate more.  If integer_nonzerop (chr)
> > > or perhaps better tree_expr_nonzero_p (chr), then it is better
> > > to optimize t = strlen (x); ... p = strchr (x, c); to
> > > t = strlen (x); ... p = memchr (x, c, t);
> > What do you mean by differentiate more? More comments? Or
> > seperate the logic more?
>
> Different code, don't add the 1 to the strlen value whenever you know
> that chr can't be possibly 0 (either it is a non-zero constant,
> or the compiler can prove it won't be zero at runtime otherwise).
> Because if c is not 0, then memchr (x, c, strlen (x)) == memchr (x, c, strlen 
> (x) + 1),
> either c is among the first strlen (x) chars, or it will return NULL
> because x[strlen (x)] == 0.
>
> It actually is slightly more complicated, strchr second argument is int,
> but we just care about the low 8 bits.
> For TREE_CODE (chr) == INTEGER_CST, it is still trivial,
> say integer_nonzerop (fold_convert (char_type_node, chr))
> or equivalent using wide-int.h APIs.
> For SSA_NAMEs, we'd need get_zero_bits API, but we only have
> get_nonzero_bits, but we could say at least handle the case where
> get_ssa_name_range_info gives a VR_RANGE or set of them where none of
> the ranges include integral multiplies of 256.
> But for start perhaps just handling INTEGER_CST chr would be good enough.

Got it. Will have that in V2.

We could also make the initial:
bool is_strchr_zerop = integer_zerop (chr);

Only check the lower 8 bits.
>
> Jakub
>


Re: [PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
On Mon, Jun 20, 2022 at 10:29 AM Jakub Jelinek  wrote:
>
> On Mon, Jun 20, 2022 at 09:35:36AM -0700, Noah Goldstein via Gcc-patches 
> wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
>
> Do you have a GCC Copyright assignment on file, or do you want to submit
> this under DCO ( https://gcc.gnu.org/dco.html )?  If the latter, there
> should be a Signed-off-by: line, both in the mail and later commit.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> > gcc/
> >
>
> As it fixes a GCC bugzilla bug, the ChangeLog entry should start with
> PR tree-optimization/95821
> line.
> > * tree-ssa-strlen.cc: Emit memchr instead of strchr if strlen
> >  already computed.
>
> All the indented lines in ChangeLog should be indented by tab.
> You are modifying strlen_pass::handle_builtin_strchr function, so after
> tree-ssa-strlen.cc there should be that function name in parens:
> * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> memchr ...
>
> >
> > gcc/testsuite/
> >
> > * c-c++-common/pr95821-1.c
> > * c-c++-common/pr95821-2.c
> > * c-c++-common/pr95821-3.c
> > * c-c++-common/pr95821-4.c
> > * c-c++-common/pr95821-5.c
> > * c-c++-common/pr95821-6.c
>
> All the above lines should end with ": New test." after .c
>
> > --- a/gcc/tree-ssa-strlen.cc
> > +++ b/gcc/tree-ssa-strlen.cc
>
> How does the patch relate to the one that H.J. attached in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821#c4 ?
>
> > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
> >  }
> >  }
> >
> > -/* Handle a strchr call.  If strlen of the first argument is known, replace
> > -   the strchr (x, 0) call with the endptr or x + strlen, otherwise remember
> > -   that lhs of the call is endptr and strlen of the argument is endptr - 
> > x.  */
> > +/* Handle a strchr call.  If strlen of the first argument is known,
> > +   replace the strchr (x, 0) call with the endptr or x + strlen,
> > +   otherwise remember that lhs of the call is endptr and strlen of the
> > +   argument is endptr - x.  If strlen of x is not know but has been
> > +   computed earlier in the tree then replace strchr(x, c) to
> > +   memchr(x, c, strlen + 1).  */
>
> Space before ( even in comments.
>
>
>
> >  void
> >  strlen_pass::handle_builtin_strchr ()
> > @@ -2418,8 +2421,8 @@ strlen_pass::handle_builtin_strchr ()
> >if (lhs == NULL_TREE)
> >  return;
> >
> > -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> > -return;
> > +  tree chr = gimple_call_arg (stmt, 1);
> > +  bool is_strchr_zerop = integer_zerop (chr);
> >
> >tree src = gimple_call_arg (stmt, 0);
> >
> > @@ -2452,32 +2455,56 @@ strlen_pass::handle_builtin_strchr ()
> > fprintf (dump_file, "Optimizing: ");
> > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
> >   }
> > -   if (si != NULL && si->endptr != NULL_TREE)
> > +   if (!is_strchr_zerop)
> >   {
> > -   rhs = unshare_expr (si->endptr);
> > -   if (!useless_type_conversion_p (TREE_TYPE (lhs),
> > -   TREE_TYPE (rhs)))
> > - rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> > +   /* If its not strchr(s, zerop) then try and convert to
> > +memchr if strlen has already been computed.  */
>
> Again, space before (.  The second line is weirdly formatted, should
> be indented below If.
>
> > +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> > +   tree one = build_int_cst (TREE_TYPE (rhs), 1);
> > +   rhs = fold_build2_loc (loc, PLUS_EXPR, TREE_TYPE (rhs),
> > +  unshare_expr (rhs), one);
> > +   tree size = make_ssa_name (TREE_TYPE (rhs));
> > +   gassign *size_stmt = gimple_build_assign (size, rhs);
> > +   gsi_insert_before (_gsi, size_stmt, GSI_SAME_STMT);
> > +   rhs = size;
> > +   if (!update_gimple_call (_gsi, fn, 3, src, chr, rhs))
> > + return;
>
> I think we should differentiate more.  If integer_nonzerop (chr)
> or perhaps better tree_expr_nonzero_p (chr), then it is better
> to optimize t = strlen (x); ... p = strchr (x, c); to
> t = strlen (x); ... p = memchr (x, c, t);
What do you mean by differentiate more? More comments? Or
seperate the logic more?

> the t + 1 is only needed if c might be zero.
>
> > +   /* Don't update strlen of lhs if search-char was non-zero.  */
>
> Wasn't known to be zero is the right thing.
>
> Jakub
>


[PATCH v1] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-06-20 Thread Noah Goldstein via Gcc-patches
This patch allows for strchr(x, c) to the replace with memchr(x, c,
strlen(x) + 1) if strlen(x) has already been computed earlier in the
tree.

Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821

Since memchr doesn't need to re-find the null terminator it is faster
than strchr.

bootstrapped and tested on x86_64-linux.

gcc/

* tree-ssa-strlen.cc: Emit memchr instead of strchr if strlen
 already computed.

gcc/testsuite/

* c-c++-common/pr95821-1.c
* c-c++-common/pr95821-2.c
* c-c++-common/pr95821-3.c
* c-c++-common/pr95821-4.c
* c-c++-common/pr95821-5.c
* c-c++-common/pr95821-6.c
---
 gcc/testsuite/c-c++-common/pr95821-1.c | 15 ++
 gcc/testsuite/c-c++-common/pr95821-2.c | 17 +++
 gcc/testsuite/c-c++-common/pr95821-3.c | 17 +++
 gcc/testsuite/c-c++-common/pr95821-4.c | 16 ++
 gcc/testsuite/c-c++-common/pr95821-5.c | 19 +++
 gcc/testsuite/c-c++-common/pr95821-6.c | 18 +++
 gcc/tree-ssa-strlen.cc | 69 +++---
 7 files changed, 152 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-3.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-4.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-5.c
 create mode 100644 gcc/testsuite/c-c++-common/pr95821-6.c

diff --git a/gcc/testsuite/c-c++-common/pr95821-1.c 
b/gcc/testsuite/c-c++-common/pr95821-1.c
new file mode 100644
index 000..e0beb609ea2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-2.c 
b/gcc/testsuite/c-c++-common/pr95821-2.c
new file mode 100644
index 000..5429f0586be
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-3.c 
b/gcc/testsuite/c-c++-common/pr95821-3.c
new file mode 100644
index 000..bc929c6044b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+   size_t slen = __builtin_strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return __builtin_strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-4.c 
b/gcc/testsuite/c-c++-common/pr95821-4.c
new file mode 100644
index 000..684b41d5b70
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   return strchr(s, c);
+}
diff --git a/gcc/testsuite/c-c++-common/pr95821-5.c 
b/gcc/testsuite/c-c++-common/pr95821-5.c
new file mode 100644
index 000..00c1d93b614
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char *s, char c, char * other)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return strchr(s, c);
+}
+int main() {}
diff --git a/gcc/testsuite/c-c++-common/pr95821-6.c 
b/gcc/testsuite/c-c++-common/pr95821-6.c
new file mode 100644
index 000..dec839de5ea
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr95821-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "memchr" } } */
+
+#include 
+#include 
+
+char *
+foo (char * __restrict s, char c, char * __restrict other)
+{
+   size_t slen = strlen(s);
+   if(slen < 1000)
+   return NULL;
+
+   *other = 0;
+
+   return strchr(s, c);
+}
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 1d4c0f78fbf..d959a530ea0 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
 }
 }
 
-/* Handle a strchr call.  If strlen of the first argument is known, 

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Noah Goldstein via Gcc-patches
On Fri, Jul 16, 2021 at 3:37 PM Matthias Kretz  wrote:

> On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote:
> > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz  wrote:
> > > I don't understand how this feature would lead to false sharing. But
> maybe
> > > I
> > > misunderstand the spatial prefetcher. The first access to one of the
> two
> > > cache
> > > lines pairs would bring both cache lines to LLC (and possibly L2). If a
> > > core
> > > with a different L2 reads the other cache line the cache line would be
> > > duplicated; if it writes to it, it would be exclusive to the other
> core's
> > > L2.
> > > The cache line pairs do not affect each other anymore. Maybe there's a
> > > minor
> > > inefficiency on initial transfer from memory, but isn't that all?
> >
> > If two cores that do not share an L2 cache need exclusive access to
> > a cache-line, the L2 spatial prefetcher could cause pingponging if those
> > two cache-lines were adjacent and shared the same 128 byte alignment.
> > Say core A requests line x1 in exclusive, it also get line x2 (not sure
> > if x2 would be in shared or exclusive), core B then requests x2 in
> > exclusive,
> > it also gets x1. Irrelevant of the state x1 comes into core B's private
> L2
> > cache
> > it invalidates the exclusive state on cache-line x1 in core A's private
> L2
> > cache. If this was done in a loop (say a simple `lock add` loop) it would
> > cause
> > pingponging on cache-lines x1/x2 between core A and B's private L2
> caches.
>
> Quoting the latest ORM: "The following two hardware prefetchers fetched
> data
> from memory to the L2 cache and last level cache:
> Spatial Prefetcher: This prefetcher strives to complete every cache line
> fetched to the L2 cache with the pair line that completes it to a 128-byte
> aligned chunk."
>
> 1. If the requested cache line is already present on some other core, the
> spatial prefetcher should not get used ("fetched data from memory").
>

I think this is correct and I'm incorrect that a request from LLC to L2
will invoke the spatial prefetcher. So not issues with 64 bytes. Sorry for
the added confusion!

>
> 2. The section is about data prefetching. It is unclear whether the
> spatial
> prefetcher applies at all for normal cache line fetches.
>
> 3. The ORM uses past tense ("The following two hardware prefetchers
> fetched
> data"), which indicates to me that Intel isn't doing this for newer
> generations anymore.


> 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of
> cache
> line A and the adjacent cache line B thus is also loaded to LLC. Core 2
> request a read of line B and thus loads line A into LLC. Now both cores
> have
> both cache lines in LLC. Core 1 writes to line A, which invalidates line A
> in
> LLC of Core 2 but does not affect line B. Core 2 writes to line B,
> invalidating line A for Core 1. => no false sharing. Where did I get my
> mental
> cache protocol wrong?


> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──
>
>
>
>


Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Noah Goldstein via Gcc-patches
On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz  wrote:

> On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote:
> > > Currently the patch does not adjust the values based on -march, as in
> JF's
> > > proposal.  I'll need more guidance from the ARM/AArch64 maintainers
> about
> > > how to go about that.  --param l1-cache-line-size is set based on
> -mtune,
> > > but I don't think we want -mtune to change these ABI-affecting values.
> > > Are
> > > there -march values for which a smaller range than 64-256 makes sense?
>
> As a user who cares about ABI but also cares about maximizing performance
> of
> builds for a specific HPC setup I'd expect the hardware interference size
> values to be allowed to break ABIs. The point of these values is to give
> me
> better performance portability (but not necessarily binary portability)
> than
> my usual "pick 64 as a good average".


> Wrt, -march / -mtune setting hardware interference size: IMO -mtune=X
> should
> be interpreted as "my binary is supposed to be optimized for X, I accept
> inefficiencies on everything that's not X".
>
> On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> > On intel x86 systems with a private L2 cache the spatial prefetcher
> > can cause destructive interference along 128 byte aligned boundaries.
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-3
> > 2-architectures-optimization-manual.pdf#page=60
>
> I don't understand how this feature would lead to false sharing. But maybe
> I
> misunderstand the spatial prefetcher. The first access to one of the two
> cache
> lines pairs would bring both cache lines to LLC (and possibly L2). If a
> core
> with a different L2 reads the other cache line the cache line would be
> duplicated; if it writes to it, it would be exclusive to the other core's
> L2.
> The cache line pairs do not affect each other anymore. Maybe there's a
> minor
> inefficiency on initial transfer from memory, but isn't that all?
>

If two cores that do not share an L2 cache need exclusive access to
a cache-line, the L2 spatial prefetcher could cause pingponging if those
two cache-lines were adjacent and shared the same 128 byte alignment.
Say core A requests line x1 in exclusive, it also get line x2 (not sure
if x2 would be in shared or exclusive), core B then requests x2 in
exclusive,
it also gets x1. Irrelevant of the state x1 comes into core B's private L2
cache
it invalidates the exclusive state on cache-line x1 in core A's private L2
cache. If this was done in a loop (say a simple `lock add` loop) it would
cause
pingponging on cache-lines x1/x2 between core A and B's private L2 caches.


>
> That said. Intel documents the spatial prefetcher exclusively for Sandy
> Bridge. So if you still believe 128 is necessary, set the destructive
> hardware
> interference size to 64 for all of x86 except -mtune=sandybridge.
>

AFAIK the spatial prefetcher exists on newer x86_64 machines as well.


>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──
>


Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-15 Thread Noah Goldstein via Gcc-patches
On Thu, Jul 15, 2021 at 10:41 PM Jason Merrill via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Adding CCs that got lost in the initial mail.
>
> On Thu, Jul 15, 2021 at 10:36 PM Jason Merrill  wrote:
>
> > The last missing piece of the C++17 standard library is the hardware
> > intereference size constants.  Much of the delay in implementing these
> has
> > been due to uncertainty about what the right values are, and even whether
> > there is a single constant value that is suitable; the destructive
> > interference size is intended to be used in structure layout, so program
> > ABIs will depend on it.
> >
> > In principle, both of these values should be the same as the target's L1
> > cache line size.  When compiling for a generic target that is intended to
> > support a range of target CPUs with different cache line sizes, the
> > constructive size should probably be the minimum size, and the
> destructive
> > size the maximum, unless you are constrained by ABI compatibility with
> > previous code.
> >
> > JF Bastien's implementation proposal is summarized at
> > https://github.com/itanium-cxx-abi/cxx-abi/issues/74
> >
> > I implement this by adding new --params for the two sizes.  Targets need
> to
> > override these values in targetm.target_option.override() to support the
> > feature.
> >
> > 64 bytes still seems correct for the x86 family.
> >
> > I'm not sure why he said 64/64 for 32-bit ARM, since the Cortex A9 has a
> > 32-byte cache line, and that seems to be the only ARM_PREFETCH_BENEFICIAL
> > target, so I'd think 32/64 would make more sense.
> >
> > He proposed 64/128 for AArch64, but since the A64FX now has a 256B cache
> > line, I've changed that to 64/256.  Does that seem right?
> >
> > Currently the patch does not adjust the values based on -march, as in
> JF's
> > proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
> > how to go about that.  --param l1-cache-line-size is set based on -mtune,
> > but I don't think we want -mtune to change these ABI-affecting values.
> Are
> > there -march values for which a smaller range than 64-256 makes sense?
> >
> > gcc/ChangeLog:
> >
> > * params.opt: Add destructive-interference-size and
> > constructive-interference-size.
> > * doc/invoke.texi: Document them.
> > * config/aarch64/aarch64.c (aarch64_override_options_internal):
> > Set them.
> > * config/arm/arm.c (arm_option_override): Set them.
> > * config/i386/i386-options.c (ix86_option_override_internal):
> > Set them.
> >
> > gcc/c-family/ChangeLog:
> >
> > * c.opt: Add -Winterference-size.
> > * c-cppbuiltin.c (cpp_atomic_builtins): Add
> __GCC_DESTRUCTIVE_SIZE
> > and __GCC_CONSTRUCTIVE_SIZE.
> >
> > gcc/cp/ChangeLog:
> >
> > * decl.c (cxx_init_decl_processing): Check
> > --param *-interference-size values.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/std/version: Define
> __cpp_lib_hardware_interference_size.
> > * libsupc++/new: Define hardware interference size variables.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.target/aarch64/interference.C: New test.
> > * g++.target/arm/interference.C: New test.
> > * g++.target/i386/interference.C: New test.
> > ---
> >  gcc/doc/invoke.texi   | 22 ++
> >  gcc/c-family/c.opt|  5 
> >  gcc/params.opt| 15 
> >  gcc/c-family/c-cppbuiltin.c   | 12 ++
> >  gcc/config/aarch64/aarch64.c  |  9 
> >  gcc/config/arm/arm.c  |  6 +
> >  gcc/config/i386/i386-options.c|  6 +
> >  gcc/cp/decl.c | 23 +++
> >  .../g++.target/aarch64/interference.C |  9 
> >  gcc/testsuite/g++.target/arm/interference.C   |  9 
> >  gcc/testsuite/g++.target/i386/interference.C  |  8 +++
> >  libstdc++-v3/include/std/version  |  3 +++
> >  libstdc++-v3/libsupc++/new| 10 ++--
> >  13 files changed, 135 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/aarch64/interference.C
> >  create mode 100644 gcc/testsuite/g++.target/arm/interference.C
> >  create mode 100644 gcc/testsuite/g++.target/i386/interference.C
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index ea8812425e9..f93cb7a20f7 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13857,6 +13857,28 @@ prefetch hints can be issued for any constant
> > stride.
> >
> >  This setting is only useful for strides that are known and constant.
> >
> > +@item destructive_interference_size
> > +@item constructive_interference_size
> > +The values for the C++17 variables
> > +@code{std::hardware_destructive_interference_size} and
> >