Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000
On Tue, 2017-01-17 at 08:30 -0600, Peter Bergner wrote: > On 1/16/17 3:09 PM, Aaron Sawdey wrote: > > Here is an updated version of this patch. > > > > Tulio noted that glibc's strncmp test was failing. This turned out > > to > > be the use of signed HOST_WIDE_INT for handling strncmp length. The > > glibc test calls strncmp with length 2^64-1, presumably to provoke > > exactly this type of bug. Fixing the issue required changing > > select_block_compare_mode() and expand_block_compare() as well. > > If glibc's testsuite exposed a bug, then we should also add a similar > bug to our testsuite. I scanned the patch and I'm not sure I see > that exact test scenario. Is it there and I'm just not seeing it? > > Peter > Nope, you didn't miss it, Peter. I will add such a test as a separate patch, this one has dragged on for a long time. I have another more comprehensive test case for strcmp/strncmp I want to add anyway. Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain
Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000
On 1/16/17 3:09 PM, Aaron Sawdey wrote: Here is an updated version of this patch. Tulio noted that glibc's strncmp test was failing. This turned out to be the use of signed HOST_WIDE_INT for handling strncmp length. The glibc test calls strncmp with length 2^64-1, presumably to provoke exactly this type of bug. Fixing the issue required changing select_block_compare_mode() and expand_block_compare() as well. If glibc's testsuite exposed a bug, then we should also add a similar bug to our testsuite. I scanned the patch and I'm not sure I see that exact test scenario. Is it there and I'm just not seeing it? Peter
Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000
On Mon, Jan 16, 2017 at 03:09:35PM -0600, Aaron Sawdey wrote: > Tulio noted that glibc's strncmp test was failing. This turned out to > be the use of signed HOST_WIDE_INT for handling strncmp length. The > glibc test calls strncmp with length 2^64-1, presumably to provoke > exactly this type of bug. Fixing the issue required changing > select_block_compare_mode() and expand_block_compare() as well. > > The other change is if we must emit a runtime check for the 4k > crossing, then we might as well set base_align to 8 and emit the best > possible code. Some nits... > --- gcc/config/rs6000/rs6000.c(revision 244503) > +++ gcc/config/rs6000/rs6000.c(working copy) > @@ -19310,7 +19310,8 @@ > WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is > the largest allowable mode. */ > static machine_mode > -select_block_compare_mode (HOST_WIDE_INT offset, HOST_WIDE_INT bytes, > +select_block_compare_mode (unsigned HOST_WIDE_INT offset, > +unsigned HOST_WIDE_INT bytes, > HOST_WIDE_INT align, bool word_mode_ok) "align" should probably be unsigned as well? > @@ -19410,8 +19411,8 @@ >gcc_assert (GET_MODE (target) == SImode); > >/* Anything to move? */ > - HOST_WIDE_INT bytes = INTVAL (bytes_rtx); > - if (bytes <= 0) > + unsigned HOST_WIDE_INT bytes = INTVAL (bytes_rtx); > + if (bytes == 0) > return true; UINTVAL? Please check the rest of the patch for this, too. >/* We don't want to generate too much code. */ >if (ROUND_UP (bytes, load_mode_size) / load_mode_size > - > rs6000_block_compare_inline_limit) > + > (unsigned HOST_WIDE_INT)rs6000_block_compare_inline_limit) > return false; Space after cast operator. Why do you need a cast at all? It already is unsigned. > + /* -m32 -mpowerpc64 results in word_mode being DImode even > + though otherwise it is 32-bit. The length arg to strncmp > + is a size_t which will be the same size as pointers. */ > + rtx len_rtx; > + if (TARGET_64BIT) > + len_rtx = gen_reg_rtx(DImode); > + else > + len_rtx = gen_reg_rtx(SImode); Space before opening paren in function calls. > + while (bytes_to_compare > 0) > { >/* Compare sequence: > check each 8B with: ld/ld cmpd bne > + If equal, use rldicr/cmpb to check for zero byte. > cleanup code at end: > cmpb get byte that differs > cmpb look for zero byte Mixed spaces/tabs indent. > @@ -19919,37 +19968,45 @@ > } > } > > - int remain = bytes - cmp_bytes; > + /* Cases to handle. A and B are chunks of the two strings. > + 1: Not end of comparison: > +A != B: branch to cleanup code to compute result. > + A == B: check for 0 byte, next block if not found. > + 2: End of the inline comparison: > +A != B: branch to cleanup code to compute result. > + A == B: check for 0 byte, call strcmp/strncmp > + 3: compared requested N bytes: > +A == B: branch to result 0. > + A != B: cleanup code to compute result. */ And here. > @@ -19957,10 +20014,102 @@ >JUMP_LABEL (j) = dst_label; >LABEL_NUSES (dst_label) += 1; > > + if (remain > 0 || equality_compare_rest) > + { > + /* Generate a cmpb to test for a 0 byte and branch > + to final result if found. */ > + rtx cmpb_zero = gen_reg_rtx (word_mode); > + rtx lab_ref_fin = gen_rtx_LABEL_REF (VOIDmode, final_move_label); > + rtx condz = gen_reg_rtx (CCmode); > + rtx zero_reg = gen_reg_rtx (word_mode); > + if (word_mode == SImode) > + { > + emit_insn (gen_movsi (zero_reg, GEN_INT(0))); Space before (0). > + emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg)); > + if ( cmp_bytes < word_mode_size ) No spaces inside the parens. > + { /* Don't want to look at zero bytes past end. */ Put the comment on a separate line please. > + emit_insn (gen_movdi (zero_reg, GEN_INT(0))); > + emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg)); > + if ( cmp_bytes < word_mode_size ) > + { /* Don't want to look at zero bytes past end. */ Again and again and again :-) Looks good otherwise. Segher
Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000
Here is an updated version of this patch. Tulio noted that glibc's strncmp test was failing. This turned out to be the use of signed HOST_WIDE_INT for handling strncmp length. The glibc test calls strncmp with length 2^64-1, presumably to provoke exactly this type of bug. Fixing the issue required changing select_block_compare_mode() and expand_block_compare() as well. The other change is if we must emit a runtime check for the 4k crossing, then we might as well set base_align to 8 and emit the best possible code. OK for trunk if bootstrap/regtest passes on ppc64/ppc64le? Thanks, Aaron On Wed, 2017-01-11 at 11:26 -0600, Aaron Sawdey wrote: > This expands on the previous patch. For strcmp and for strncmp with N > larger than 64, the first 64 bytes of comparison is expanded inline > and > then a call to strcmp or strncmp is emitted to compare the remainder > if > the strings are found to be equal at that point. > > -mstring-compare-inline-limit=N determines how many block comparisons > are emitted. With the default 8, and 64-bit code, you get 64 bytes. > > Performance testing on a power8 system shows that the code is > anywhere > from 2-8 times faster than RHEL7.2 glibc strcmp/strncmp depending on > alignment and length. > > In the process of adding this I discovered that the expansion being > generated for strncmp had a bug in that it was not testing for a zero > byte to terminate the comparison. As a result inputs like > strncmp("AB\0CDEFGX", "AB\0CDEFGY", 9) would be compared not equal. > The > initial comparison of a doubleword would be equal so a second one > would > be fetched and compared, ignoring the zero byte that should have > terminated comparison. The fix is to use a cmpb to check for zero > bytes > in the equality case before comparing the next chunk. I updated the > strncmp-1.c test case to check for this, and also added a new > strcmp-1.c test case to check strcmp expansion. Also both now have a > length 100 tests to check the transition from the inline comparison > to > the library call for the remainder. > > ChangeLog > 2017-01-11 Aaron Sawdey> * config/rs6000/rs6000-protos.h (expand_strn_compare): Add arg. > * config/rs6000/rs6000.c (expand_strn_compare): Add ability to > expand > strcmp. Fix bug where comparison didn't stop with zero byte. > * config/rs6000/rs6000.md (cmpstrnsi): Args to > expand_strn_compare. > (cmpstrsi): Add pattern. > > gcc.dg/ChangeLog > 2017-01-11 Aaron Sawdey > * gcc.dg/strcmp-1.c: New. > * gcc.dg/strncmp-1.c: Add test for a bug that escaped. > > > > -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 244503) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -78,7 +78,7 @@ extern int expand_block_clear (rtx[]); extern int expand_block_move (rtx[]); extern bool expand_block_compare (rtx[]); -extern bool expand_strn_compare (rtx[]); +extern bool expand_strn_compare (rtx[], int); extern const char * rs6000_output_load_multiple (rtx[]); extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode); extern bool rs6000_is_valid_and_mask (rtx, machine_mode); Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 244503) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -19310,7 +19310,8 @@ WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is the largest allowable mode. */ static machine_mode -select_block_compare_mode (HOST_WIDE_INT offset, HOST_WIDE_INT bytes, +select_block_compare_mode (unsigned HOST_WIDE_INT offset, + unsigned HOST_WIDE_INT bytes, HOST_WIDE_INT align, bool word_mode_ok) { /* First see if we can do a whole load unit @@ -19322,7 +19323,7 @@ Do largest chunk possible without violating alignment rules. */ /* The most we can read without potential page crossing. */ - HOST_WIDE_INT maxread = ROUND_UP (bytes, align); + unsigned HOST_WIDE_INT maxread = ROUND_UP (bytes, align); if (word_mode_ok && bytes >= UNITS_PER_WORD) return word_mode; @@ -19410,8 +19411,8 @@ gcc_assert (GET_MODE (target) == SImode); /* Anything to move? */ - HOST_WIDE_INT bytes = INTVAL (bytes_rtx); - if (bytes <= 0) + unsigned HOST_WIDE_INT bytes = INTVAL (bytes_rtx); + if (bytes == 0) return true; /* The code generated for p7 and older is not faster than glibc @@ -19432,14 +19433,14 @@ /* Strategy phase. How many ops will this take and should we expand it? */ - int offset = 0; + unsigned HOST_WIDE_INT offset = 0; machine_mode load_mode = select_block_compare_mode (offset, bytes, base_align, word_mode_ok); - int
[PATCH, bugfix] builtin expansion of strcmp for rs6000
This expands on the previous patch. For strcmp and for strncmp with N larger than 64, the first 64 bytes of comparison is expanded inline and then a call to strcmp or strncmp is emitted to compare the remainder if the strings are found to be equal at that point. -mstring-compare-inline-limit=N determines how many block comparisons are emitted. With the default 8, and 64-bit code, you get 64 bytes. Performance testing on a power8 system shows that the code is anywhere from 2-8 times faster than RHEL7.2 glibc strcmp/strncmp depending on alignment and length. In the process of adding this I discovered that the expansion being generated for strncmp had a bug in that it was not testing for a zero byte to terminate the comparison. As a result inputs like strncmp("AB\0CDEFGX", "AB\0CDEFGY", 9) would be compared not equal. The initial comparison of a doubleword would be equal so a second one would be fetched and compared, ignoring the zero byte that should have terminated comparison. The fix is to use a cmpb to check for zero bytes in the equality case before comparing the next chunk. I updated the strncmp-1.c test case to check for this, and also added a new strcmp-1.c test case to check strcmp expansion. Also both now have a length 100 tests to check the transition from the inline comparison to the library call for the remainder. ChangeLog 2017-01-11 Aaron Sawdey* config/rs6000/rs6000-protos.h (expand_strn_compare): Add arg. * config/rs6000/rs6000.c (expand_strn_compare): Add ability to expand strcmp. Fix bug where comparison didn't stop with zero byte. * config/rs6000/rs6000.md (cmpstrnsi): Args to expand_strn_compare. (cmpstrsi): Add pattern. gcc.dg/ChangeLog 2017-01-11 Aaron Sawdey * gcc.dg/strcmp-1.c: New. * gcc.dg/strncmp-1.c: Add test for a bug that escaped. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 244322) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -78,7 +78,7 @@ extern int expand_block_clear (rtx[]); extern int expand_block_move (rtx[]); extern bool expand_block_compare (rtx[]); -extern bool expand_strn_compare (rtx[]); +extern bool expand_strn_compare (rtx[], int); extern const char * rs6000_output_load_multiple (rtx[]); extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode); extern bool rs6000_is_valid_and_mask (rtx, machine_mode); Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 244322) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -19635,22 +19635,36 @@ OPERANDS[0] is the target (result). OPERANDS[1] is the first source. OPERANDS[2] is the second source. + If NO_LENGTH is zero, then: OPERANDS[3] is the length. - OPERANDS[4] is the alignment in bytes. */ + OPERANDS[4] is the alignment in bytes. + If NO_LENGTH is nonzero, then: + OPERANDS[3] is the alignment in bytes. */ bool -expand_strn_compare (rtx operands[]) +expand_strn_compare (rtx operands[], int no_length) { rtx target = operands[0]; rtx orig_src1 = operands[1]; rtx orig_src2 = operands[2]; - rtx bytes_rtx = operands[3]; - rtx align_rtx = operands[4]; + rtx bytes_rtx, align_rtx; + if (no_length) +{ + bytes_rtx = NULL; + align_rtx = operands[3]; +} + else +{ + bytes_rtx = operands[3]; + align_rtx = operands[4]; +} HOST_WIDE_INT cmp_bytes = 0; rtx src1 = orig_src1; rtx src2 = orig_src2; - /* If this is not a fixed size compare, just call strncmp. */ - if (!CONST_INT_P (bytes_rtx)) + /* If we have a length, it must be constant. This simplifies things + a bit as we don't have to generate code to check if we've exceeded + the length. Later this could be expanded to handle this case. */ + if (!no_length && !CONST_INT_P (bytes_rtx)) return false; /* This must be a fixed size alignment. */ @@ -19668,8 +19682,6 @@ gcc_assert (GET_MODE (target) == SImode); - HOST_WIDE_INT bytes = INTVAL (bytes_rtx); - /* If we have an LE target without ldbrx and word_mode is DImode, then we must avoid using word_mode. */ int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX @@ -19678,16 +19690,37 @@ int word_mode_size = GET_MODE_SIZE (word_mode); int offset = 0; + HOST_WIDE_INT bytes; /* N from the strncmp args if available. */ + HOST_WIDE_INT compare_length; /* How much we are going to compare inline. */ + if (no_length) +/* Use this as a standin to determine the mode to use. */ +bytes = rs6000_string_compare_inline_limit * word_mode_size; + else +bytes = INTVAL (bytes_rtx); + machine_mode