Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On 12/11/2016 05:21 PM, Martin Sebor wrote: So I think the return value needs a bit of clarification here. Take guidance from our discussion on this thread. OK with that fixed. jeff The "strange test failures" that I wrote about in a separate thread late last week prompted me to re-review the patch and add more test cases. Those in turn exposed a bug in the adjust_range_for_overflow function involving types of the same precision but different sign where converting an unsigned range with an upper bound in excess of the directive's TYPE_MAX would incorrectly accept the converted range even though the new upper bound was less than the lower bound. The updated patch corrects this oversight. In addition, it adjusts the handling of the obscure corner case of zero precision and zero argument which results in zero bytes (except in some even more obscure cases involving some flags for some conversions). For instance: printf ("%.0i", 0); results in zero bytes, but printf ("%+.0i", 0); results in 1 byte (and prints '+'). This is tracked in bug 78606. Although the differences between the approved patch and the update are very small I repost it in case one of you would like to double check them. If not I'll commit the updated patch later in the week. Martin gcc-78622.diff PR middle-end/78622 - -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping PR middle-end78606 - -Wformat-length/-fprintf-return-value incorrect for %+.0i and %.0o with zero value gcc/ChangeLog: PR middle-end/78622 * gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange rather than res.bounded. (get_width_and_precision): Set precision to -1 when negative. (adjust_range_for_overflow): New function. (format_integer): Correct the handling of the space, plus, and pound flags, and the special case of zero precision. Always set res.bounded to true unless either precision or width is specified and unknown. Call adjust_range_for_overflow. Avoid use zero as the shortest value when precision is specified but unknown. (format_directive): Remove vestigial quoting. Always inform of argument value or range when it's available. (add_bytes): Correct the computation of boundrange used to decide whether a warning is of a "maybe" or "defnitely" kind. gcc/testsuite/ChangeLog: PR middle-end/78622 * gcc.c-torture/execute/pr78622.c: New test. * gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined behavior inadvertently introduced in a previous commit. Tighten up final checking. * gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity. Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-5.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and add a final optimization check. * gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases. * gcc.dg/tree-ssa/pr78622.c: New test. OK. jeff
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On 12/07/2016 12:17 PM, Martin Sebor wrote: OK. So is the hangup really a problem in how the return type is documented? I parsed the comment as essentially saying we return true if the range gets adjusted in any way -- thus a sign change in the first block would qualify, but we returned false which seemed inconsistent. Looking at it again, what I think it's saying is we're returning true only for a subset of adjustments to the range, ie, those cases where the postcondition does not hold. Correct? Correct. Would changing the description of the return value to this make it clearer? Return true when the range has been adjusted to the full unsigned range of DIRTYPE, [0, DIRTYPE_MAX], false otherwise. Yea, that does help. I see you've got an updated version posted. Let me take a final (?) looksie now that I have a better understanding of the return value. I think that was the biggest stumbling block on my side. Jeff
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
So I think the return value needs a bit of clarification here. Take guidance from our discussion on this thread. OK with that fixed. jeff The "strange test failures" that I wrote about in a separate thread late last week prompted me to re-review the patch and add more test cases. Those in turn exposed a bug in the adjust_range_for_overflow function involving types of the same precision but different sign where converting an unsigned range with an upper bound in excess of the directive's TYPE_MAX would incorrectly accept the converted range even though the new upper bound was less than the lower bound. The updated patch corrects this oversight. In addition, it adjusts the handling of the obscure corner case of zero precision and zero argument which results in zero bytes (except in some even more obscure cases involving some flags for some conversions). For instance: printf ("%.0i", 0); results in zero bytes, but printf ("%+.0i", 0); results in 1 byte (and prints '+'). This is tracked in bug 78606. Although the differences between the approved patch and the update are very small I repost it in case one of you would like to double check them. If not I'll commit the updated patch later in the week. Martin PR middle-end/78622 - -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping PR middle-end78606 - -Wformat-length/-fprintf-return-value incorrect for %+.0i and %.0o with zero value gcc/ChangeLog: PR middle-end/78622 * gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange rather than res.bounded. (get_width_and_precision): Set precision to -1 when negative. (adjust_range_for_overflow): New function. (format_integer): Correct the handling of the space, plus, and pound flags, and the special case of zero precision. Always set res.bounded to true unless either precision or width is specified and unknown. Call adjust_range_for_overflow. Avoid use zero as the shortest value when precision is specified but unknown. (format_directive): Remove vestigial quoting. Always inform of argument value or range when it's available. (add_bytes): Correct the computation of boundrange used to decide whether a warning is of a "maybe" or "defnitely" kind. gcc/testsuite/ChangeLog: PR middle-end/78622 * gcc.c-torture/execute/pr78622.c: New test. * gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined behavior inadvertently introduced in a previous commit. Tighten up final checking. * gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity. Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-5.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and add a final optimization check. * gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases. * gcc.dg/tree-ssa/pr78622.c: New test. diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index 99a635a..905917d 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -637,7 +637,7 @@ min_bytes_remaining (unsigned HOST_WIDE_INT navail, const format_result &res) if (HOST_WIDE_INT_MAX <= navail) return navail; - if (1 < warn_format_length || res.bounded) + if (1 < warn_format_length || res.knownrange) { /* At level 2, or when all directives output an exact number of bytes or when their arguments were bounded by known @@ -785,7 +785,7 @@ get_width_and_precision (const conversion_spec &spec, { prec = tree_to_shwi (spec.star_precision); if (prec < 0) - prec = 0; + prec = -1; } else prec = HOST_WIDE_INT_MIN; @@ -795,6 +795,69 @@ get_width_and_precision (const conversion_spec &spec, *pprec = prec; } +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual + argument, due to the conversion from either *ARGMIN or *ARGMAX to + the type of the directive's formal argument it's possible for both + to result in the same number of bytes or a range of bytes that's + less than the number of bytes that would result from formatting + some other value in the range [*ARGMIN, *ARGMAX]. This can be + determined by checking for the actual argument being in the range + of the type of the directive. If it isn't it must be assumed to + take on the full range of the directive's type. + Return true when the range has been adjusted to the full unsigned + range of DIRTYPE, or [0, DIRTYPE_MAX], and false otherwise. */ + +static bool +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax) +{ + tree argtype = TREE_TYPE (*argmin); + unsigned argprec = TYPE_PRECISION (argtype); + unsigned dirprec = TYPE_PRECISION (dirtype); + + /* If the actual argument and the directive's argument have the same + precision and sign there can be no overflow and so there is nothing + t
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On 12/07/2016 11:43 AM, Jeff Law wrote: On 12/02/2016 03:54 PM, Martin Sebor wrote: Thanks for looking at this! I realize it's dense code and not easy to make sense out of. + + if (TREE_CODE (*argmin) == INTEGER_CST + && TREE_CODE (*argmax) == INTEGER_CST + && (dirprec >= argprec + || integer_zerop (int_const_binop (RSHIFT_EXPR, + int_const_binop (MINUS_EXPR, + *argmax, + *argmin), + size_int (dirprec) + { +*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0, false); +*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0, false); +return false; + } So in this case we're not changing the values, we're just converting them to a equal or wider type, right? Thus no adjustment (what about a sign change? Is that an adjustment?) and we return false per the function's specifications. This casts the value to the destination type, so it may result in different values. The important postcondition it preserves is that the difference ARGMAX - ARGMIN is less than or equal to TYPE_MAX of the directive's unsigned TYPE. (If it isn't, the range cannot be converted.) This lets us take, say: int f (int i) { if (i < 1024 || 1033 < i) i = 1024; return snprintf (0, 0, "%hhi", i); } and fold the function call into 1 because (signed char)1024 is 0 and (signed char)1033 is 9, and every other value in that same original range is also in the same range after the conversion. We know it's safe because (1033 - 1024 <= UCHAR_MAX) holds. But the code in question is guarded by dirprec >= argprec. Thus aren't we converting to an equal or wider type? In the case of converting to a wider type, ISTM the values won't change and thus we return false. I just saw your other response after I typed this up (thanks!) but so just to close the loop: You're right, when converting to a wider type the values won't change. The guard is an OR expression: dirprec >= argprec OR 0 == ((ARGMAX - ARGMIN) >> TYPE_PRECISION (dirtype)) and so when (dirprec < argprec) holds the values might change if the other operand is true. In both of these cases the function returns false to indicate that the original range the arguments were in hasn't changed as a result. If we are converting to the same sized type, but with a different signedness, then the values will have been adjusted and ISTM we ought to be returning true in that case. But the code actually returns false. I must be missing something here. The return value of the function is only used to decide the phrasing of the notes printed after warnings and what values to include in them. Returning false means that the notes will say something like "directive argument in the range [ARGMIN, ARGMAX]" where the values are those determined by VRP. Returning true means that the note will instead say "using the range [TYPE_MIN, 0] for directive argument." This distinction will (I hope) let the user know that the second case is less accurate than the first. It's somewhat subtle but I think useful to understand why the checker decided to warn. Once I'm done with all my work I'd like to document this in more detail on the Wiki. What about overflows when either argmin or argmax won't fit into dirtype without an overflow? What's the right behavior there? That's fine just as long as the property above holds. I think the algorithm works but the code came from tree-vrp where there are all sorts of additional checks for some infinities which VRP distinguishes from type limits like SCHAR_MIN and _MAX. I don't fully understand those and so I'd feel better if someone who does double checked this code. That's what prompted my question about overflows. It was a general concern after reading the VRP code. I did not have a specific case in mind that would be mis-handled. Okay. + tree dirmin = TYPE_MIN_VALUE (dirtype); + tree dirmax = TYPE_MAX_VALUE (dirtype); From this point onward argmin/argmax are either not integers or we're doing a narrowing conversion, right? In both cases the fact that we're doing a narrowing conversion constrains the range Argmin and argmax are always integers. The rest of the function handles the case where the postcondition above doesn't hold (e.g., in function g above). OK. So is the hangup really a problem in how the return type is documented? I parsed the comment as essentially saying we return true if the range gets adjusted in any way -- thus a sign change in the first block would qualify, but we returned false which seemed inconsistent. Looking at it again, what I think it's saying is we're returning true only for a subset of adjustments to the range, ie, those cases where the postcondition does not hold. Correct? Correct. Would changing the description of the return value to this make it clearer? Return true when the range has been adjuste
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On 12/05/2016 08:43 PM, Martin Sebor wrote: Martin $ cat b.c && /build/gcc-78622/gcc/xgcc -B /build/gcc-78622/gcc -O2 -S -Wall -Wextra -Wpedantic b.c char d[1]; void f (int i) { if (i < 1024 || 1033 < i) i = 1024; __builtin_sprintf (d + 1, "%hhi", i); } void g (int i) { if (i < 1024 || 3456 < i) i = 1024; __builtin_sprintf (d + 1, "%hhi", i); } b.c: In function ‘f’: b.c:7:30: warning: ‘%hhi’ directive writing 1 byte into a region of size 0 [-Wformat-length=] __builtin_sprintf (d + 1, "%hhi", i); ^~~~ b.c:7:29: note: directive argument in the range [1024, 1033] __builtin_sprintf (d + 1, "%hhi", i); ^~ b.c:7:3: note: format output 2 bytes into a destination of size 0 __builtin_sprintf (d + 1, "%hhi", i); ^~~~ b.c: In function ‘g’: b.c:14:30: warning: ‘%hhi’ directive writing between 1 and 4 bytes into a region of size 0 [-Wformat-length=] __builtin_sprintf (d + 1, "%hhi", i); ^~~~ b.c:14:29: note: using the range [0, -128] for directive argument __builtin_sprintf (d + 1, "%hhi", i); ^~ b.c:14:3: note: format output between 2 and 5 bytes into a destination of size 0 __builtin_sprintf (d + 1, "%hhi", i); ^~~~ gcc-78622.diff PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping gcc/ChangeLog: PR middle-end/78622 * gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange rather than res.bounded. (adjust_range_for_overflow): New function. (format_integer): Always set res.bounded to true unless either precision or width is specified and unknown. Call adjust_range_for_overflow. (format_directive): Remove vestigial quoting. Always inform of argument value or range when it's available. (add_bytes): Correct the computation of boundrange used to decide whether a warning is of a "maybe" or "defnitely" kind. gcc/testsuite/ChangeLog: PR middle-end/78622 * gcc.c-torture/execute/pr78622.c: New test. * gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined behavior inadvertently introduced in a previous commit. Tighten up final checking. * gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity. Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-5.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and add a final optimization check. * gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases. diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index 99a635a..a888f55 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -795,6 +795,59 @@ get_width_and_precision (const conversion_spec &spec, *pprec = prec; } +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual + argument, due to the conversion from either *ARGMIN or *ARGMAX to + the type of the directive's formal argument it's possible for both + to result in the same number of bytes or a range of bytes that's + less than the number of bytes that would result from formatting + some other value in the range [*ARGMIN, *ARGMAX]. This can be + determined by checking for the actual argument being in the range + of the type of the directive. If it isn't it must be assumed to + take on the full range of the directive's type. + Return true when the range has been adjusted to the range of + DIRTYPE, false otherwise. */ So I think the return value needs a bit of clarification here. Take guidance from our discussion on this thread. OK with that fixed. jeff
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On 12/02/2016 03:54 PM, Martin Sebor wrote: Thanks for looking at this! I realize it's dense code and not easy to make sense out of. + + if (TREE_CODE (*argmin) == INTEGER_CST + && TREE_CODE (*argmax) == INTEGER_CST + && (dirprec >= argprec + || integer_zerop (int_const_binop (RSHIFT_EXPR, + int_const_binop (MINUS_EXPR, + *argmax, + *argmin), + size_int (dirprec) + { +*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0, false); +*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0, false); +return false; + } So in this case we're not changing the values, we're just converting them to a equal or wider type, right? Thus no adjustment (what about a sign change? Is that an adjustment?) and we return false per the function's specifications. This casts the value to the destination type, so it may result in different values. The important postcondition it preserves is that the difference ARGMAX - ARGMIN is less than or equal to TYPE_MAX of the directive's unsigned TYPE. (If it isn't, the range cannot be converted.) This lets us take, say: int f (int i) { if (i < 1024 || 1033 < i) i = 1024; return snprintf (0, 0, "%hhi", i); } and fold the function call into 1 because (signed char)1024 is 0 and (signed char)1033 is 9, and every other value in that same original range is also in the same range after the conversion. We know it's safe because (1033 - 1024 <= UCHAR_MAX) holds. But the code in question is guarded by dirprec >= argprec. Thus aren't we converting to an equal or wider type? In the case of converting to a wider type, ISTM the values won't change and thus we return false. If we are converting to the same sized type, but with a different signedness, then the values will have been adjusted and ISTM we ought to be returning true in that case. But the code actually returns false. I must be missing something here. What about overflows when either argmin or argmax won't fit into dirtype without an overflow? What's the right behavior there? That's fine just as long as the property above holds. I think the algorithm works but the code came from tree-vrp where there are all sorts of additional checks for some infinities which VRP distinguishes from type limits like SCHAR_MIN and _MAX. I don't fully understand those and so I'd feel better if someone who does double checked this code. That's what prompted my question about overflows. It was a general concern after reading the VRP code. I did not have a specific case in mind that would be mis-handled. + + tree dirmin = TYPE_MIN_VALUE (dirtype); + tree dirmax = TYPE_MAX_VALUE (dirtype); From this point onward argmin/argmax are either not integers or we're doing a narrowing conversion, right? In both cases the fact that we're doing a narrowing conversion constrains the range Argmin and argmax are always integers. The rest of the function handles the case where the postcondition above doesn't hold (e.g., in function g above). OK. So is the hangup really a problem in how the return type is documented? I parsed the comment as essentially saying we return true if the range gets adjusted in any way -- thus a sign change in the first block would qualify, but we returned false which seemed inconsistent. Looking at it again, what I think it's saying is we're returning true only for a subset of adjustments to the range, ie, those cases where the postcondition does not hold. Correct? + + if (TYPE_UNSIGNED (dirtype)) +{ + *argmin = dirmin; + *argmax = dirmax; +} + else +{ + *argmin = integer_zero_node; + *argmax = dirmin; +} Should this be dirmax? Am I missing something here? It's dirmin because for a signed type, due to the sign, it results in more bytes on output than dirmin would. (E.g., with SCHAR_MIN = -128 and SCHAR_MAX = 127 it's four bytes vs three.) Ah. I understand that. THanks for clarifying. jeff
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On 12/05/2016 01:26 PM, Jakub Jelinek wrote: Hi! On Thu, Dec 01, 2016 at 07:31:18PM -0700, Martin Sebor wrote: +static bool +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax) +{ + if (TYPE_UNSIGNED (dirtype)) +{ + *argmin = dirmin; + *argmax = dirmax; +} + else +{ + *argmin = integer_zero_node; + *argmax = dirmin; +} I still don't really like this mixing of ranges of values and picking of values which result in shortest and longest representation, it is confusing and will be a maintainance nightmare. IMHO much cleaner is first figure out the range the argument (in argtype) has. I.e. look at VR_RANGE and if it is missing, perhaps find out another argtype and in any case, use TYPE_{MIN,MAX}_VALUE (argtype) as the range. I think that should probably be the range presented to the user in diagnostics (i.e. res.arg{min,max}). Next step is to adjust this range for the case where dirtype is different from argtype. This should be done regardless of what way you get the first range from (whether from VR_RANGE or VR_VARYING etc.). The result of this still should be a range of values in dirtype. And the last step should be to pick the values from that range which has shortest and longest representation. For unsigned dirtype that are the bounds of the range from earlier step, for signed dirtype something different (if both bounds are >= 0, then also just those bounds, if both bounds are < 0, then the bounds swapped, otherwise 0 as minimum, then e.g. try both bounds what has longer representation, or take some short path e.g. if abs of the negative bound is >= the positive bound, then use the negative bound as longest, otherwise try both). I take this as your confirmation that the function does do the right thing. If not, or if you can't confirm that, it would be helpful if you could let me know so that I can ask Richard to look it over. The pass already does pretty much what you describe but I was able to simplify the format_integer function somewhat and also arrange for the informational note mentioning the range of argument values to more closely reflect what I think you suggest. With the attached patch the following example produces the output below. In the first function, because the pass uses the actual argument range unchanged, the warning prints the exact byte count and the note the original range. In the second function, the range must be adjusted to that of the directive's type, and the warning prints a range of bytes and the note says "using the range [0, -128]" to indicate that it used a different range than the range of the actual argument. It might be possible to change the note to say something like "using the range of type 'signed char' or something like that, to make it even clearer that the whole range of the type is being used. But these further tweaks should be made independently of this patch, perhaps as part bug 77696 if and when I get to it. Martin $ cat b.c && /build/gcc-78622/gcc/xgcc -B /build/gcc-78622/gcc -O2 -S -Wall -Wextra -Wpedantic b.c char d[1]; void f (int i) { if (i < 1024 || 1033 < i) i = 1024; __builtin_sprintf (d + 1, "%hhi", i); } void g (int i) { if (i < 1024 || 3456 < i) i = 1024; __builtin_sprintf (d + 1, "%hhi", i); } b.c: In function ‘f’: b.c:7:30: warning: ‘%hhi’ directive writing 1 byte into a region of size 0 [-Wformat-length=] __builtin_sprintf (d + 1, "%hhi", i); ^~~~ b.c:7:29: note: directive argument in the range [1024, 1033] __builtin_sprintf (d + 1, "%hhi", i); ^~ b.c:7:3: note: format output 2 bytes into a destination of size 0 __builtin_sprintf (d + 1, "%hhi", i); ^~~~ b.c: In function ‘g’: b.c:14:30: warning: ‘%hhi’ directive writing between 1 and 4 bytes into a region of size 0 [-Wformat-length=] __builtin_sprintf (d + 1, "%hhi", i); ^~~~ b.c:14:29: note: using the range [0, -128] for directive argument __builtin_sprintf (d + 1, "%hhi", i); ^~ b.c:14:3: note: format output between 2 and 5 bytes into a destination of size 0 __builtin_sprintf (d + 1, "%hhi", i); ^~~~ PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping gcc/ChangeLog: PR middle-end/78622 * gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange rather than res.bounded. (adjust_range_for_overflow): New function. (format_integer): Always set res.bounded to true unless either precision or width is specified and unknown. Call adjust_range_for_overflow. (format_directive): Remove vestigial quoting. Always inform of argument value or range when it's available. (add_bytes): Correct the computation of boundrange used to decide whether a warning is of a "maybe" or "defnitely" kind. gcc/testsuite/ChangeLog: PR middle-end
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
Hi! On Thu, Dec 01, 2016 at 07:31:18PM -0700, Martin Sebor wrote: > +static bool > +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax) > +{ > + if (TYPE_UNSIGNED (dirtype)) > +{ > + *argmin = dirmin; > + *argmax = dirmax; > +} > + else > +{ > + *argmin = integer_zero_node; > + *argmax = dirmin; > +} I still don't really like this mixing of ranges of values and picking of values which result in shortest and longest representation, it is confusing and will be a maintainance nightmare. IMHO much cleaner is first figure out the range the argument (in argtype) has. I.e. look at VR_RANGE and if it is missing, perhaps find out another argtype and in any case, use TYPE_{MIN,MAX}_VALUE (argtype) as the range. I think that should probably be the range presented to the user in diagnostics (i.e. res.arg{min,max}). Next step is to adjust this range for the case where dirtype is different from argtype. This should be done regardless of what way you get the first range from (whether from VR_RANGE or VR_VARYING etc.). The result of this still should be a range of values in dirtype. And the last step should be to pick the values from that range which has shortest and longest representation. For unsigned dirtype that are the bounds of the range from earlier step, for signed dirtype something different (if both bounds are >= 0, then also just those bounds, if both bounds are < 0, then the bounds swapped, otherwise 0 as minimum, then e.g. try both bounds what has longer representation, or take some short path e.g. if abs of the negative bound is >= the positive bound, then use the negative bound as longest, otherwise try both). Jakub
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
Thanks for looking at this! I realize it's dense code and not easy to make sense out of. PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping gcc/ChangeLog: PR middle-end/78622 * gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange rather than res.bounded. (adjust_range_for_overflow): New function. (format_integer): Always set res.bounded to true unless either precision or width is specified and unknown. Call adjust_range_for_overflow. (format_directive): Remove vestigial quoting. (add_bytes): Correct the computation of boundrange used to decide whether a warning is of a "maybe" or "defnitely" kind. s/defnitely/definitely/ Will fix. +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual + argument, due to the conversion from either *ARGMIN or *ARGMAX to + the type of the directive's formal argument it's possible for both + to result in the same number of bytes or a range of bytes that's + less than the number of bytes that would result from formatting + some other value in the range [*ARGMIN, *ARGMAX]. This can be + determined by checking for the actual argument being in the range + of the type of the directive. If it isn't it must be assumed to + take on the full range of the directive's type. + Return true when the range has been adjusted to the range of + DIRTYPE, false otherwise. */ I wish I'd counted how many times I read that. Let me see if I can word it better. + +static bool +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax) +{ + unsigned argprec = TYPE_PRECISION (TREE_TYPE (*argmin)); + unsigned dirprec = TYPE_PRECISION (dirtype); + + /* The logic here was inspired/lifted from the CONVERT_EXPR_CODE_P + branch in the extract_range_from_unary_expr function in tree-vrp.c. + */ Formatting it. If the comment-close won't fit, then line wrap prior to the last word in the comment. Okay. I didn't remember what the exact rules were. + + if (TREE_CODE (*argmin) == INTEGER_CST + && TREE_CODE (*argmax) == INTEGER_CST + && (dirprec >= argprec + || integer_zerop (int_const_binop (RSHIFT_EXPR, + int_const_binop (MINUS_EXPR, + *argmax, + *argmin), + size_int (dirprec) + { +*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0, false); +*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0, false); +return false; + } So in this case we're not changing the values, we're just converting them to a equal or wider type, right? Thus no adjustment (what about a sign change? Is that an adjustment?) and we return false per the function's specifications. This casts the value to the destination type, so it may result in different values. The important postcondition it preserves is that the difference ARGMAX - ARGMIN is less than or equal to TYPE_MAX of the directive's unsigned TYPE. (If it isn't, the range cannot be converted.) This lets us take, say: int f (int i) { if (i < 1024 || 1033 < i) i = 1024; return snprintf (0, 0, "%hhi", i); } and fold the function call into 1 because (signed char)1024 is 0 and (signed char)1033 is 9, and every other value in that same original range is also in the same range after the conversion. We know it's safe because (1033 - 1024 <= UCHAR_MAX) holds. But in in this: int g (int i) { if (i < 1024 || 1289 < i) i = 1024; return snprintf (0, 0, "%hhi", i); } even though (signed char)1024 is 0 and (signed char)1289 is also 9 like above, since (1289 - 1024) is 265 and thus greater than UCHAR_MAX, folding the result wouldn't be safe (for example, (sighed char)1034 yields 10). I picture this as a window bounded by the range of the directive's type sliding within another window bounded by the argument's type: [<-[...dirtype...]--->] the outer brackets delimit the range of the argument and the inner ones the directive's. The smaller window can slide left and right and it can shrink but it can't be wider that the directive's type's range. What about overflows when either argmin or argmax won't fit into dirtype without an overflow? What's the right behavior there? That's fine just as long as the property above holds. I think the algorithm works but the code came from tree-vrp where there are all sorts of additional checks for some infinities which VRP distinguishes from type limits like SCHAR_MIN and _MAX. I don't fully understand those and so I'd feel better if someone who does double checked this code. + + tree dirmin = TYPE_MIN_VALUE (dirtype); + tree dirmax = TYPE_MAX_VALUE (dirtype); From this point onward argmin/argmax are either not integers or we're doing a narrowing conversion, right? In both cases the fact that we're doing a narrowing conversion constrains th
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On 12/01/2016 07:31 PM, Martin Sebor wrote: On 12/01/2016 02:15 AM, Jakub Jelinek wrote: On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote: Isn't this too simplistic? I mean, if you have say dirtype of signed char and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg has range 32, 64, while I think your routine will yield -128, 127 (well, 0 as min and -128 as max as that is what you return for signed type). Can't you subtract argmax - argmin (best just in wide_int, no need to build trees), and use what you have just for the case where that number doesn't fit into the narrower precision, otherwise if argmin - (dirtype) argmin == argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax as the range, and in case that it crosses a boundary figure if you can do anything than the above? Guess all cases of signed/unsigned dirtype and/or argtype need to be considered. Richard noted that you could have a look at CONVERT_EXPR_CODE_P handling in extract_range_from_unary_expr. I think it is the || (vr0.type == VR_RANGE && integer_zerop (int_const_binop (RSHIFT_EXPR, int_const_binop (MINUS_EXPR, vr0.max, vr0.min), size_int (TYPE_PRECISION (outer_type))) part that is important here for the narrowing conversion. Thanks, that was a helpful suggestion! Attached is an update that follows the vrp approach. I assume the infinity stuff is specific to the VRP pass and not something I need to worry about here. Martin PS To your earlier question, argmin and argmax have the same meaning in the added helper function as in its caller. gcc-78622.diff PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping gcc/ChangeLog: PR middle-end/78622 * gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange rather than res.bounded. (adjust_range_for_overflow): New function. (format_integer): Always set res.bounded to true unless either precision or width is specified and unknown. Call adjust_range_for_overflow. (format_directive): Remove vestigial quoting. (add_bytes): Correct the computation of boundrange used to decide whether a warning is of a "maybe" or "defnitely" kind. s/defnitely/definitely/ gcc/testsuite/ChangeLog: PR middle-end/78622 * gcc.c-torture/execute/pr78622.c: New test. * gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined behavior inadvertently introduced in a previous commit. Tighten up final checking. * gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity. Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and add a final optimization check. * gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases. diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index 99a635a..95a8710 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -795,6 +795,59 @@ get_width_and_precision (const conversion_spec &spec, *pprec = prec; } +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual + argument, due to the conversion from either *ARGMIN or *ARGMAX to + the type of the directive's formal argument it's possible for both + to result in the same number of bytes or a range of bytes that's + less than the number of bytes that would result from formatting + some other value in the range [*ARGMIN, *ARGMAX]. This can be + determined by checking for the actual argument being in the range + of the type of the directive. If it isn't it must be assumed to + take on the full range of the directive's type. + Return true when the range has been adjusted to the range of + DIRTYPE, false otherwise. */ I wish I'd counted how many times I read that. + +static bool +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax) +{ + unsigned argprec = TYPE_PRECISION (TREE_TYPE (*argmin)); + unsigned dirprec = TYPE_PRECISION (dirtype); + + /* The logic here was inspired/lifted from the CONVERT_EXPR_CODE_P + branch in the extract_range_from_unary_expr function in tree-vrp.c. + */ Formatting it. If the comment-close won't fit, then line wrap prior to the last word in the comment. + + if (TREE_CODE (*argmin) == INTEGER_CST + && TREE_CODE (*argmax) == INTEGER_CST + && (dirprec >= argprec + || integer_zerop (int_const_binop (RSHIFT_EXPR, +int_const_binop (MINUS_EXPR, + *argmax, + *argmin), +size_int (dirprec) + { +
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On 12/01/2016 02:15 AM, Jakub Jelinek wrote: On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote: Isn't this too simplistic? I mean, if you have say dirtype of signed char and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg has range 32, 64, while I think your routine will yield -128, 127 (well, 0 as min and -128 as max as that is what you return for signed type). Can't you subtract argmax - argmin (best just in wide_int, no need to build trees), and use what you have just for the case where that number doesn't fit into the narrower precision, otherwise if argmin - (dirtype) argmin == argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax as the range, and in case that it crosses a boundary figure if you can do anything than the above? Guess all cases of signed/unsigned dirtype and/or argtype need to be considered. Richard noted that you could have a look at CONVERT_EXPR_CODE_P handling in extract_range_from_unary_expr. I think it is the || (vr0.type == VR_RANGE && integer_zerop (int_const_binop (RSHIFT_EXPR, int_const_binop (MINUS_EXPR, vr0.max, vr0.min), size_int (TYPE_PRECISION (outer_type))) part that is important here for the narrowing conversion. Thanks, that was a helpful suggestion! Attached is an update that follows the vrp approach. I assume the infinity stuff is specific to the VRP pass and not something I need to worry about here. Martin PS To your earlier question, argmin and argmax have the same meaning in the added helper function as in its caller. PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping gcc/ChangeLog: PR middle-end/78622 * gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange rather than res.bounded. (adjust_range_for_overflow): New function. (format_integer): Always set res.bounded to true unless either precision or width is specified and unknown. Call adjust_range_for_overflow. (format_directive): Remove vestigial quoting. (add_bytes): Correct the computation of boundrange used to decide whether a warning is of a "maybe" or "defnitely" kind. gcc/testsuite/ChangeLog: PR middle-end/78622 * gcc.c-torture/execute/pr78622.c: New test. * gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined behavior inadvertently introduced in a previous commit. Tighten up final checking. * gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity. Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases. * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same. * gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and add a final optimization check. * gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases. diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index 99a635a..95a8710 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -637,7 +637,7 @@ min_bytes_remaining (unsigned HOST_WIDE_INT navail, const format_result &res) if (HOST_WIDE_INT_MAX <= navail) return navail; - if (1 < warn_format_length || res.bounded) + if (1 < warn_format_length || res.knownrange) { /* At level 2, or when all directives output an exact number of bytes or when their arguments were bounded by known @@ -795,6 +795,59 @@ get_width_and_precision (const conversion_spec &spec, *pprec = prec; } +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual + argument, due to the conversion from either *ARGMIN or *ARGMAX to + the type of the directive's formal argument it's possible for both + to result in the same number of bytes or a range of bytes that's + less than the number of bytes that would result from formatting + some other value in the range [*ARGMIN, *ARGMAX]. This can be + determined by checking for the actual argument being in the range + of the type of the directive. If it isn't it must be assumed to + take on the full range of the directive's type. + Return true when the range has been adjusted to the range of + DIRTYPE, false otherwise. */ + +static bool +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax) +{ + unsigned argprec = TYPE_PRECISION (TREE_TYPE (*argmin)); + unsigned dirprec = TYPE_PRECISION (dirtype); + + /* The logic here was inspired/lifted from the CONVERT_EXPR_CODE_P + branch in the extract_range_from_unary_expr function in tree-vrp.c. + */ + + if (TREE_CODE (*argmin) == INTEGER_CST + && TREE_CODE (*argmax) == INTEGER_CST + && (dirprec >= argprec + || integer_zerop (int_const_binop (RSHIFT_EXPR, + int_const_binop (MINUS_EXPR, + *argmax, + *argmin), + size_int (dirprec) + { +*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0, false); +*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0, false); +return false; + } + + tree dirmin
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote: > Isn't this too simplistic? I mean, if you have say dirtype of signed char > and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg > has range 32, 64, while I think your routine will yield -128, 127 (well, > 0 as min and -128 as max as that is what you return for signed type). > > Can't you subtract argmax - argmin (best just in wide_int, no need to build > trees), and use what you have just for the case where that number doesn't > fit into the narrower precision, otherwise if argmin - (dirtype) argmin > == argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax > as the range, and in case that it crosses a boundary figure if you can do > anything than the above? Guess all cases of signed/unsigned dirtype and/or > argtype need to be considered. Richard noted that you could have a look at CONVERT_EXPR_CODE_P handling in extract_range_from_unary_expr. I think it is the || (vr0.type == VR_RANGE && integer_zerop (int_const_binop (RSHIFT_EXPR, int_const_binop (MINUS_EXPR, vr0.max, vr0.min), size_int (TYPE_PRECISION (outer_type))) part that is important here for the narrowing conversion. Jakub
Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
On Wed, Nov 30, 2016 at 08:26:04PM -0700, Martin Sebor wrote: > @@ -795,6 +795,43 @@ get_width_and_precision (const conversion_spec &spec, >*pprec = prec; > } > > +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual > + argument, due to the conversion from either *ARGMIN or *ARGMAX to > + the type of the directive's formal argument it's possible for both > + to result in the same number of bytes or a range of bytes that's > + less than the number of bytes that would result from formatting > + some other value in the range [*ARGMIN, *ARGMAX]. This can be > + determined by checking for the actual argument being in the range > + of the type of the directive. If it isn't it must be assumed to > + take on the full range of the directive's type. > + Return true when the range has been adjusted, false otherwise. */ > + > +static bool > +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax) > +{ > + tree dirmin = TYPE_MIN_VALUE (dirtype); > + tree dirmax = TYPE_MAX_VALUE (dirtype); > + > + if (tree_int_cst_lt (*argmin, dirmin) > + || tree_int_cst_lt (dirmax, *argmin) > + || tree_int_cst_lt (*argmax, dirmin) > + || tree_int_cst_lt (dirmax, *argmax)) > +{ > + if (TYPE_UNSIGNED (dirtype)) > + { > + *argmin = dirmin; > + *argmax = dirmax; > + } > + else > + { > + *argmin = integer_zero_node; > + *argmax = dirmin; > + } > + return true; > +} Isn't this too simplistic? I mean, if you have say dirtype of signed char and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg has range 32, 64, while I think your routine will yield -128, 127 (well, 0 as min and -128 as max as that is what you return for signed type). Can't you subtract argmax - argmin (best just in wide_int, no need to build trees), and use what you have just for the case where that number doesn't fit into the narrower precision, otherwise if argmin - (dirtype) argmin == argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax as the range, and in case that it crosses a boundary figure if you can do anything than the above? Guess all cases of signed/unsigned dirtype and/or argtype need to be considered. Also, is argmin and argmax in this case the actual range (what should go into res.arg{min,max}), or the values with shortest/longest representation? Wouldn't it be better to always compute the range of values that can be printed and only later on (after all VR_RANGE and VR_VARYING handling) transform that into the number with shortest/longest representation in that range? Perhaps even using different variable names for the latter would make things clearer (argshortest, arglongest or whatever). Jakub