On 10/2/18 10:37 AM, Martin Sebor wrote:
> [1/4] - Introduce struct strlen_data_t into gimple-fold
>
> This patch introduces a new data structure to reduce the number
> of arguments and overloads of the get_range_strlen API. One of
> the goals of this change is to make the API safer to use for
> optimization (which looks for "permissive" results to account
> even for some undefined uses) vs warnings (which relies on
> conservative results to avoid false positives). The patch also
> adds provides explicit arguments to get_range_strlen and adds
> descriptive comments to make the affected code easier to follow.
> Beyond making use of the new data structure the patch makes no
> observable changes.
>
> The changes to gcc/testsuite/gcc.dg/strlenopt-51.c fix a few
> typos with no functional effect and tweak the helper macro
> used by the test to make debugging slightly easier.
>
> gcc-99999-1.diff
>
> [1/4] - Introduce struct strlen_data_t into gimple-fold.
>
> gcc/ChangeLog:
>
> * builtins.c (check_access): Document argumens to a function call.
> (check_strncat_sizes): Same.
> (expand_builtin_strncat): Same.
> * calls.c (maybe_warn_nonstring_arg): Same.
> * gimple-fold.h (struct strlen_data_t): New type.
> (get_range_strlen): New overload.
> * gimple-fold.c (struct strlen_data_t): New type.
> (get_range_strlen): Change declaration to take strlen_data_t*
> argument instead of length, flexp, eltsize, and nonstr.
> Adjust to use strlen_data_t members instead of other arguments.
> Also set strlen_data_t::maxsize to the same value as maxlen.
> (extern get_range_strlen): Define new overload.
> (get_maxval_strlen): Adjust to use strlen_data_t.
> * gimple-ssa-sprintf.c (get_string_length): Same.
>
> gcc/testsuite/ChangeLog:
> gcc.dg/strlenopt-51.c: Add counter to macros and fix typos.
>
> diff --git a/gcc/calls.c b/gcc/calls.c
> index e9660b6..11f00ad 100644
> --- a/gcc/calls.c
> +++ b/gcc/calls.c
> @@ -1582,7 +1585,8 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp)
> {
> tree arg = CALL_EXPR_ARG (exp, argno);
> g if (!get_attr_nonstring_decl (arg))
> - get_range_strlen (arg, lenrng);
> + get_range_strlen (arg, lenrng, /* eltsize = */ 1,
> + /* strict = */ false);
> }
> }
> /* Fall through. */
As Bernd noted, something clearly went wrong here with the patch the "g"
at the beginning of the line. Hand-edited patch perhaps? Regardless I
assume you'll fix this.
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 1e84722..8f71e9c 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -1262,11 +1262,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator
> *gsi, tree c, tree len)
> }
>
>
> -/* Obtain the minimum and maximum string length or minimum and maximum
> - value of ARG in LENGTH[0] and LENGTH[1], respectively.
> +/* Try to obtain the range of the lengths of the string(s) referenced
> + by ARG, or the size of the largest array ARG referes to if the range
> + of lengths cannot be determined, and store all in *PDATA.
> If ARG is an SSA name variable, follow its use-def chains. When
> - TYPE == 0, if LENGTH[1] is not equal to the length we determine or
> - if we are unable to determine the length or value, return false.
> + TYPE == 0, then if PDATA->MAXLEN is not equal to the determined
> + length or if we are unable to determine the length or value, return
> + false.
> VISITED is a bitmap of visited variables.
> TYPE is 0 if string length should be obtained, 1 for maximum string
> length and 2 for maximum value ARG can have.
So can we clarify the return value? What is the return value for types
other than 0? I struggled with parsing that before this patch :-)
>
> static bool
> -get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
> - int fuzzy, bool *flexp, unsigned eltsize, tree *nonstr)
> +get_range_strlen (tree arg, bitmap *visited, int type, int fuzzy,
> + strlen_data_t *pdata)
So ISTM we should keep ELTSIZE as distinct parameter as its strictly an
input. That in turn allows *PDATA to be strictly filled in as an output.
>
>
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 26e2727..0d523e7 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -25,6 +25,53 @@ along with GCC; see the file COPYING3. If not see
> extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
> extern tree canonicalize_constructor_val (tree, tree);
> extern tree get_symbol_constant_value (tree);
> +
> +struct strlen_data_t
> +{
> + /* [MIN, MAXSIZE, MAXLEN] is a range describing the length of
> + a string of possibly unknown length. For a string of known
> + length the range is a constant where MIN == MAXSIZE == MAXLEN
> + holds.
> + For other strings, MIN is the length of the shortest string that
> + can be stored in the referenced object, i.e., MIN == 0. MAXSIZE
> + is the size of the largest array referenced by the expression.
> + MAXLEN is the length of the longest sequence of non-zero bytes
> + in memory reference by the expression. For such strings,
> + MIN <= MAXSIZE <= MAXLEN holds. For example, given:
Hmm, the description here is a little confusing WRT MAXSIZE vs MAXLEN.
I think the distinction we're trying to make is that MAXSIZE is the
maximum respecting subobject boundaries and that MAXLEN is the maximum
without respecting subobject boundaries. If so, then I think we need
to look for a better name than MAXSIZE and MAXLEN.
> + /* When non-null, NONSTR refers to the declaration known to store
> + an unterminated constant character array, as in:
> + const char s[] = { 'a', 'b', 'c' };
> + It is used to diagnose uses of such arrays in functions such as
> + strlen() that expect a nul-terminated string as an argument. */
> + tree nonstr;
So rather than NONSTR, DECL may make more sense -- if for no other
reason than you don't have to think in terms of "not a string".
> + /* ELTSIZE is set to 1 for single byte character strings, and 2 or 4
> + for wide characer strings. */
> + unsigned eltsize;
Bernd's suggestion that we separate the input vs output paramters may be
a reasonable one -- I think this is the only in-parameter you're passing
with the structure, right? And everything else is a pure output? If so
we may be better off continuing to pass the element size as a separate
parameter.
> + /* FLEXARRAY is true if the range of the string lengths has been
> + obtained from the upper bound of an array at the end of a struct.
> + Such an array may hold a string that's longer than its upper bound
> + due to it being used as a poor-man's flexible array member. */
> + bool flexarray;
> +
> + /* Set ELTSIZE and value-initialize all other members. */
> + strlen_data_t (unsigned eltbytes)
> + : minlen (), maxlen (), maxsize (), nonstr (), eltsize (eltbytes),
> + flexarray () { }
I think if you pull ELTSIZE out and pass it as a distinct parameter,
then you don't need a ctor and you can have a POD. You can then
initialize with memset rather than having to individually initialize
each field -- meaning it's easier to add more output fields in the future.
I don't think any of the suggestions above change the behavior of the
patch. Let's hold off committing though (I assume you've got a GIT
topic branch where you can make these changes and update the subsequent
patches independent of everything else...)
jeff