Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-17 Thread Aaron Sawdey
On Tue, 2017-01-17 at 08:30 -0600, Peter Bergner wrote:
> On 1/16/17 3:09 PM, Aaron Sawdey wrote:
> > Here is an updated version of this patch.
> > 
> > Tulio noted that glibc's strncmp test was failing. This turned out
> > to
> > be the use of signed HOST_WIDE_INT for handling strncmp length. The
> > glibc test calls strncmp with length 2^64-1, presumably to provoke
> > exactly this type of bug. Fixing the issue required changing
> > select_block_compare_mode() and expand_block_compare() as well.
> 
> If glibc's testsuite exposed a bug, then we should also add a similar
> bug to our testsuite.  I scanned the patch and I'm not sure I see
> that exact test scenario.  Is it there and I'm just not seeing it?
> 
> Peter
> 

Nope, you didn't miss it, Peter. I will add such a test as a separate
patch, this one has dragged on for a long time. I have another more
comprehensive test case for strcmp/strncmp I want to add anyway.

Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-17 Thread Peter Bergner

On 1/16/17 3:09 PM, Aaron Sawdey wrote:

Here is an updated version of this patch.

Tulio noted that glibc's strncmp test was failing. This turned out to
be the use of signed HOST_WIDE_INT for handling strncmp length. The
glibc test calls strncmp with length 2^64-1, presumably to provoke
exactly this type of bug. Fixing the issue required changing
select_block_compare_mode() and expand_block_compare() as well.


If glibc's testsuite exposed a bug, then we should also add a similar
bug to our testsuite.  I scanned the patch and I'm not sure I see
that exact test scenario.  Is it there and I'm just not seeing it?

Peter





Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-16 Thread Segher Boessenkool
On Mon, Jan 16, 2017 at 03:09:35PM -0600, Aaron Sawdey wrote:
> Tulio noted that glibc's strncmp test was failing. This turned out to
> be the use of signed HOST_WIDE_INT for handling strncmp length. The
> glibc test calls strncmp with length 2^64-1, presumably to provoke
> exactly this type of bug. Fixing the issue required changing
> select_block_compare_mode() and expand_block_compare() as well.
> 
> The other change is if we must emit a runtime check for the 4k
> crossing, then we might as well set base_align to 8 and emit the best
> possible code.

Some nits...

> --- gcc/config/rs6000/rs6000.c(revision 244503)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -19310,7 +19310,8 @@
> WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is
> the largest allowable mode.  */
>  static machine_mode
> -select_block_compare_mode (HOST_WIDE_INT offset, HOST_WIDE_INT bytes,
> +select_block_compare_mode (unsigned HOST_WIDE_INT offset,
> +unsigned HOST_WIDE_INT bytes,
>  HOST_WIDE_INT align, bool word_mode_ok)

"align" should probably be unsigned as well?

> @@ -19410,8 +19411,8 @@
>gcc_assert (GET_MODE (target) == SImode);
>  
>/* Anything to move? */
> -  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
> -  if (bytes <= 0)
> +  unsigned HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
> +  if (bytes == 0)
>  return true;

UINTVAL?  Please check the rest of the patch for this, too.

>/* We don't want to generate too much code.  */
>if (ROUND_UP (bytes, load_mode_size) / load_mode_size
> -  > rs6000_block_compare_inline_limit)
> +  > (unsigned HOST_WIDE_INT)rs6000_block_compare_inline_limit)
>  return false;

Space after cast operator.  Why do you need a cast at all?  It already
is unsigned.

> +   /* -m32 -mpowerpc64 results in word_mode being DImode even
> +  though otherwise it is 32-bit. The length arg to strncmp
> +  is a size_t which will be the same size as pointers.  */
> +   rtx len_rtx;
> +   if (TARGET_64BIT)
> + len_rtx = gen_reg_rtx(DImode);
> +   else
> + len_rtx = gen_reg_rtx(SImode);

Space before opening paren in function calls.

> +  while (bytes_to_compare > 0)
>  {
>/* Compare sequence:
>   check each 8B with: ld/ld cmpd bne
> +  If equal, use rldicr/cmpb to check for zero byte.
>   cleanup code at end:
>   cmpb  get byte that differs
>   cmpb  look for zero byte

Mixed spaces/tabs indent.

> @@ -19919,37 +19968,45 @@
>   }
>   }
>  
> -  int remain = bytes - cmp_bytes;
> +  /* Cases to handle.  A and B are chunks of the two strings.
> + 1: Not end of comparison:
> +A != B: branch to cleanup code to compute result.
> +   A == B: check for 0 byte, next block if not found.
> + 2: End of the inline comparison:
> +A != B: branch to cleanup code to compute result.
> +   A == B: check for 0 byte, call strcmp/strncmp
> +  3: compared requested N bytes:
> +A == B: branch to result 0.
> +   A != B: cleanup code to compute result.  */

And here.

> @@ -19957,10 +20014,102 @@
>JUMP_LABEL (j) = dst_label;
>LABEL_NUSES (dst_label) += 1;
>  
> +  if (remain > 0 || equality_compare_rest)
> + {
> +   /* Generate a cmpb to test for a 0 byte and branch
> +  to final result if found.  */
> +   rtx cmpb_zero = gen_reg_rtx (word_mode);
> +   rtx lab_ref_fin = gen_rtx_LABEL_REF (VOIDmode, final_move_label);
> +   rtx condz = gen_reg_rtx (CCmode);
> +   rtx zero_reg = gen_reg_rtx (word_mode);
> +   if (word_mode == SImode)
> + {
> +   emit_insn (gen_movsi (zero_reg, GEN_INT(0)));

Space before (0).

> +   emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg));
> +   if ( cmp_bytes < word_mode_size )

No spaces inside the parens.

> + { /* Don't want to look at zero bytes past end.  */

Put the comment on a separate line please.

> +   emit_insn (gen_movdi (zero_reg, GEN_INT(0)));
> +   emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg));
> +   if ( cmp_bytes < word_mode_size )
> + { /* Don't want to look at zero bytes past end.  */

Again and again and again :-)

Looks good otherwise.


Segher


Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-16 Thread Aaron Sawdey
Here is an updated version of this patch. 

Tulio noted that glibc's strncmp test was failing. This turned out to
be the use of signed HOST_WIDE_INT for handling strncmp length. The
glibc test calls strncmp with length 2^64-1, presumably to provoke
exactly this type of bug. Fixing the issue required changing
select_block_compare_mode() and expand_block_compare() as well.

The other change is if we must emit a runtime check for the 4k
crossing, then we might as well set base_align to 8 and emit the best
possible code.

OK for trunk if bootstrap/regtest passes on ppc64/ppc64le?

Thanks,
  Aaron

On Wed, 2017-01-11 at 11:26 -0600, Aaron Sawdey wrote:
> This expands on the previous patch. For strcmp and for strncmp with N
> larger than 64, the first 64 bytes of comparison is expanded inline
> and
> then a call to strcmp or strncmp is emitted to compare the remainder
> if
> the strings are found to be equal at that point. 
> 
> -mstring-compare-inline-limit=N determines how many block comparisons
> are emitted. With the default 8, and 64-bit code, you get 64 bytes. 
> 
> Performance testing on a power8 system shows that the code is
> anywhere
> from 2-8 times faster than RHEL7.2 glibc strcmp/strncmp depending on
> alignment and length.
> 
> In the process of adding this I discovered that the expansion being
> generated for strncmp had a bug in that it was not testing for a zero
> byte to terminate the comparison. As a result inputs like
> strncmp("AB\0CDEFGX", "AB\0CDEFGY", 9) would be compared not equal.
> The
> initial comparison of a doubleword would be equal so a second one
> would
> be fetched and compared, ignoring the zero byte that should have
> terminated comparison. The fix is to use a cmpb to check for zero
> bytes
> in the equality case before comparing the next chunk. I updated the
> strncmp-1.c test case to check for this, and also added a new 
> strcmp-1.c test case to check strcmp expansion. Also both now have a
> length 100 tests to check the transition from the inline comparison
> to
> the library call for the remainder.
> 
> ChangeLog
> 2017-01-11  Aaron Sawdey  
>   * config/rs6000/rs6000-protos.h (expand_strn_compare): Add arg.
>   * config/rs6000/rs6000.c (expand_strn_compare): Add ability to
> expand
>   strcmp. Fix bug where comparison didn't stop with zero byte.
>   * config/rs6000/rs6000.md (cmpstrnsi): Args to
> expand_strn_compare.
>   (cmpstrsi): Add pattern.
> 
> gcc.dg/ChangeLog
> 2017-01-11  Aaron Sawdey  
>   * gcc.dg/strcmp-1.c: New.
>   * gcc.dg/strncmp-1.c: Add test for a bug that escaped.
> 
> 
> 
> 
-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h	(revision 244503)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -78,7 +78,7 @@
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
-extern bool expand_strn_compare (rtx[]);
+extern bool expand_strn_compare (rtx[], int);
 extern const char * rs6000_output_load_multiple (rtx[]);
 extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
 extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 244503)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -19310,7 +19310,8 @@
WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is
the largest allowable mode.  */
 static machine_mode
-select_block_compare_mode (HOST_WIDE_INT offset, HOST_WIDE_INT bytes,
+select_block_compare_mode (unsigned HOST_WIDE_INT offset,
+			   unsigned HOST_WIDE_INT bytes,
 			   HOST_WIDE_INT align, bool word_mode_ok)
 {
   /* First see if we can do a whole load unit
@@ -19322,7 +19323,7 @@
  Do largest chunk possible without violating alignment rules.  */
 
   /* The most we can read without potential page crossing.  */
-  HOST_WIDE_INT maxread = ROUND_UP (bytes, align);
+  unsigned HOST_WIDE_INT maxread = ROUND_UP (bytes, align);
 
   if (word_mode_ok && bytes >= UNITS_PER_WORD)
 return word_mode;
@@ -19410,8 +19411,8 @@
   gcc_assert (GET_MODE (target) == SImode);
 
   /* Anything to move? */
-  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
-  if (bytes <= 0)
+  unsigned HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
+  if (bytes == 0)
 return true;
 
   /* The code generated for p7 and older is not faster than glibc
@@ -19432,14 +19433,14 @@
 
   /* Strategy phase.  How many ops will this take and should we expand it?  */
 
-  int offset = 0;
+  unsigned HOST_WIDE_INT offset = 0;
   machine_mode load_mode =
 select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
-  int 

[PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-11 Thread Aaron Sawdey
This expands on the previous patch. For strcmp and for strncmp with N
larger than 64, the first 64 bytes of comparison is expanded inline and
then a call to strcmp or strncmp is emitted to compare the remainder if
the strings are found to be equal at that point. 

-mstring-compare-inline-limit=N determines how many block comparisons
are emitted. With the default 8, and 64-bit code, you get 64 bytes. 

Performance testing on a power8 system shows that the code is anywhere
from 2-8 times faster than RHEL7.2 glibc strcmp/strncmp depending on
alignment and length.

In the process of adding this I discovered that the expansion being
generated for strncmp had a bug in that it was not testing for a zero
byte to terminate the comparison. As a result inputs like
strncmp("AB\0CDEFGX", "AB\0CDEFGY", 9) would be compared not equal. The
initial comparison of a doubleword would be equal so a second one would
be fetched and compared, ignoring the zero byte that should have
terminated comparison. The fix is to use a cmpb to check for zero bytes
in the equality case before comparing the next chunk. I updated the
strncmp-1.c test case to check for this, and also added a new 
strcmp-1.c test case to check strcmp expansion. Also both now have a
length 100 tests to check the transition from the inline comparison to
the library call for the remainder.

ChangeLog
2017-01-11  Aaron Sawdey  
* config/rs6000/rs6000-protos.h (expand_strn_compare): Add arg.
* config/rs6000/rs6000.c (expand_strn_compare): Add ability to expand
strcmp. Fix bug where comparison didn't stop with zero byte.
* config/rs6000/rs6000.md (cmpstrnsi): Args to expand_strn_compare.
(cmpstrsi): Add pattern.

gcc.dg/ChangeLog
2017-01-11  Aaron Sawdey  
* gcc.dg/strcmp-1.c: New.
* gcc.dg/strncmp-1.c: Add test for a bug that escaped.




-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h	(revision 244322)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -78,7 +78,7 @@
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
-extern bool expand_strn_compare (rtx[]);
+extern bool expand_strn_compare (rtx[], int);
 extern const char * rs6000_output_load_multiple (rtx[]);
 extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
 extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 244322)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -19635,22 +19635,36 @@
OPERANDS[0] is the target (result).
OPERANDS[1] is the first source.
OPERANDS[2] is the second source.
+   If NO_LENGTH is zero, then:
OPERANDS[3] is the length.
-   OPERANDS[4] is the alignment in bytes.  */
+   OPERANDS[4] is the alignment in bytes.
+   If NO_LENGTH is nonzero, then:
+   OPERANDS[3] is the alignment in bytes.  */
 bool
-expand_strn_compare (rtx operands[])
+expand_strn_compare (rtx operands[], int no_length)
 {
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
-  rtx bytes_rtx = operands[3];
-  rtx align_rtx = operands[4];
+  rtx bytes_rtx, align_rtx;
+  if (no_length)
+{
+  bytes_rtx = NULL;
+  align_rtx = operands[3];
+}
+  else
+{
+  bytes_rtx = operands[3];
+  align_rtx = operands[4];
+}
   HOST_WIDE_INT cmp_bytes = 0;
   rtx src1 = orig_src1;
   rtx src2 = orig_src2;
 
-  /* If this is not a fixed size compare, just call strncmp.  */
-  if (!CONST_INT_P (bytes_rtx))
+  /* If we have a length, it must be constant. This simplifies things
+ a bit as we don't have to generate code to check if we've exceeded
+ the length. Later this could be expanded to handle this case.  */
+  if (!no_length && !CONST_INT_P (bytes_rtx))
 return false;
 
   /* This must be a fixed size alignment.  */
@@ -19668,8 +19682,6 @@
 
   gcc_assert (GET_MODE (target) == SImode);
 
-  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
-
   /* If we have an LE target without ldbrx and word_mode is DImode,
  then we must avoid using word_mode.  */
   int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX
@@ -19678,16 +19690,37 @@
   int word_mode_size = GET_MODE_SIZE (word_mode);
 
   int offset = 0;
+  HOST_WIDE_INT bytes; /* N from the strncmp args if available.  */
+  HOST_WIDE_INT compare_length; /* How much we are going to compare inline.  */
+  if (no_length)
+/* Use this as a standin to determine the mode to use.  */
+bytes = rs6000_string_compare_inline_limit * word_mode_size;
+  else
+bytes = INTVAL (bytes_rtx);
+
   machine_mode