On Wed, 27 Jun 2018, Jeff Law wrote: > > On 06/18/2018 09:37 AM, Qing Zhao wrote: > > Hi, > > > > this is the 3rd (and the last) patch for PR78809: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 > > Inline strcmp with small constant strings > > > > The design doc for PR78809 is at: > > https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html > > > > this patch is for the third part of change of PR78809. > > > > C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n) > > if the result is NOT used to do simple equality test against zero, one of > > "s1" or "s2" is a small constant string, n is a constant, and the Min value > > of > > the length of the constant string and "n" is smaller than a predefined > > threshold T, > > inline the call by a byte-to-byte comparision sequence to avoid calling > > overhead. > > > > adding test case strcmpopt_5.c into gcc.dg for part C of PR78809. > > > > bootstraped and tested on both X86 and Aarch64. no regression. > > > > I have done some experiments to check the run-time performance impact > > and code-size impact from such inlining with different threshold for the > > length of the constant string to inline, the data were recorded in the > > attachments of > > PR78809: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809. > > > > and finally decide that the default value of the threshold is 3. > > this value of the threshold can be adjusted by the new option: > > > > --param builtin-string-cmp-inline-length= > > > > The following is the patch. > > > > thanks. > > > > Qing > > > > gcc/ChangeLog: > > > > +2018-06-18 Qing Zhao <qing.z...@oracle.com> > > + > > + PR middle-end/78809 > > + * builtins.c (expand_builtin_memcmp): Inline the calls first > > + when result_eq is false. > > + (expand_builtin_strcmp): Inline the calls first. > > + (expand_builtin_strncmp): Likewise. > > + (inline_string_cmp): New routine. Expand a string compare > > + call by using a sequence of char comparison. > > + (inline_expand_builtin_string_cmp): New routine. Inline expansion > > + a call to str(n)cmp/memcmp. > > + * doc/invoke.texi (--param builtin-string-cmp-inline-length): New > > option. > > + * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New. > > + > > > > gcc/testsuite/ChangeLog: > > > > +2018-06-18 Qing Zhao <qing.z...@oracle.com> > > + > > + PR middle-end/78809 > > + * gcc.dg/strcmpopt_5.c: New test. > So I still need to dig into this patch. But I wanted to raise an > potential issue and get yours and Martin's thoughts on it. > > Martin (and others) have been working hard to improve GCC's ability to > give good diagnostics for various problems with calls to mem* and str* > functions (buffer overflows, restrict issues, etc). > > One of the problems Martin has identified is early conversion of these > calls into inlined direct operations. If those conversions happen prior > to the analysis for warnings we obviously can't issue any relevant warnings.
>From the looks of the changelog this seems to be done at RTL expansion time -- which also makes me question why targets do not provide cmpstr[n] / cmpmem optabs when doing the inlining is so obviously beneficial? Note I didn't look at the actual patch. Richard. > > > + > > > > > > > > 0001-3nd-Patch-for-PR78009.patch > > > > > > From fcf6ced44e6e3e4a14894cc8f43ac39bc17aafee Mon Sep 17 00:00:00 2001 > > From: qing zhao <qing.z...@oracle.com> > > Date: Thu, 14 Jun 2018 14:32:46 -0700 > > Subject: [PATCH] 3nd Patch for PR78009 > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 > > Inline strcmp with small constant strings > > > > The design doc for PR78809 is at: > > https://www.mail-archive.com/gcc@gcc.gnu.org/msg83822.html > > > > this patch is for the third part of change of PR78809. > > > > C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n) > > if the result is NOT used to do simple equality test against zero, one of > > "s1" or "s2" is a small constant string, n is a constant, and the Min value > > of > > the length of the constant string and "n" is smaller than a predefined > > threshold T, > > inline the call by a byte-to-byte comparision sequence to avoid calling > > overhead. > > > > adding test case strcmpopt_5.c into gcc.dg for part C of PR78809. > > > > bootstraped and tested on both X86 and Aarch64. no regression. > > --- > > gcc/builtins.c | 172 > > +++++++++++++++++++++++++++++++++++-- > > gcc/doc/invoke.texi | 5 ++ > > gcc/params.def | 4 + > > gcc/testsuite/gcc.dg/strcmpopt_5.c | 80 +++++++++++++++++ > > 4 files changed, 253 insertions(+), 8 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/strcmpopt_5.c > > > > diff --git a/gcc/builtins.c b/gcc/builtins.c > > index 6b3e6b2..cf50d05 100644 > > --- a/gcc/builtins.c > > +++ b/gcc/builtins.c > > @@ -4358,6 +4360,17 @@ expand_builtin_memcmp (tree exp, rtx target, bool > > result_eq) > > POINTER_TYPE, POINTER_TYPE, INTEGER_TYPE, VOID_TYPE)) > > return NULL_RTX; > > > > + /* due to the performance benefit, always inline the calls first > > + when result_eq is false. */ > Please capitalize the first word in sentences like this. This nit > appears in most of your comments. > > > > + rtx result = NULL_RTX; > > + > > + if (!result_eq) > > + { > > + result = inline_expand_builtin_string_cmp (exp, target, true); > > + if (result) > > + return result; > > + } > > + > So I believe you do inline expansion here prior to the checks for > Wstringop_overflow. I think you can safely move this code to after the > warn_stringop_overflow checks. Though you may want to make this code > conditional on both calls to check_access returning true and avoiding > your transformation if either or both calls return false. > > Alternately you'd need to verify that inline_expand_builtin_string_cmp > always returns false for cases which are going to generate a warning. > But that seems a bit tougher to maintain over time if we were to add > more warnings to this code. > > > > > @@ -4448,6 +4461,12 @@ expand_builtin_strcmp (tree exp, ATTRIBUTE_UNUSED > > rtx target) > > if (!validate_arglist (exp, POINTER_TYPE, POINTER_TYPE, VOID_TYPE)) > > return NULL_RTX; > > > > + /* due to the performance benefit, always inline the calls first. */ > > + rtx result = NULL_RTX; > > + result = inline_expand_builtin_string_cmp (exp, target, false); > > + if (result) > > + return result; > Similarly. > > > > > @@ -4562,6 +4580,12 @@ expand_builtin_strncmp (tree exp, ATTRIBUTE_UNUSED > > rtx target, > > POINTER_TYPE, POINTER_TYPE, INTEGER_TYPE, VOID_TYPE)) > > return NULL_RTX; > > > > + /* due to the performance benefit, always inline the calls first. */ > > + rtx result = NULL_RTX; > > + result = inline_expand_builtin_string_cmp (exp, target, false); > > + if (result) > > + return result; > Similarly here. > > > @@ -6640,6 +6664,138 @@ expand_builtin_goacc_parlevel_id_size (tree exp, > > rtx target, int ignore) > > return target; > > } > > > > +/* Expand a string compare operation using a sequence of char comparison > > + to get rid of the calling overhead. > > + > > + var_str is the variable string source; > > + const_str is the constant string source; > > + length is the number of chars to compare; > > + const_str_n indicates which source string is the constant string; > > + is_memcmp indicates whether it's a memcmp or strcmp. > When referring to arguments in comments, please capitalize them. ie > VAR_STR, CONST_STR, etc. > > > > + > > + to: (assume const_str_n is 2, i.e., arg2 is a constant string) > > + > > + target = var_str[0] - const_str[0]; > > + if (target != 0) > > + goto ne_label; > > + ... > > + target = var_str[length - 2] - const_str[length - 2]; > > + if (target != 0) > > + goto ne_label; > > + target = var_str[length - 1] - const_str[length - 1]; > > + ne_label: > > + */ > But leave them as lower case in code fragments like this. > > > So this generally looks pretty good. THe biggest technical concern is > making sure we're doing the right thing WRT issuing warnings. You can > tackle that problem by deferring inlining to a later point after > warnings have been issued or by verifying that your routines do not > inline in cases where warnings will be issued. It may be worth adding > testcases for these issues. > > There's a large number of comments that need capitalization fixes. > > Given there was no measured runtime performance impact, but slight > improvements on codesize for values <= 3, let's go ahead with that as > the default. > > Can you address the issues above and repost for final review? > > Thanks, > jeff > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)