Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-12 Thread Jeff Law

On 12/11/2016 05:21 PM, Martin Sebor wrote:

So I think the return value needs a bit of clarification here.  Take
guidance from our discussion on this thread.

OK with that fixed.

jeff


The "strange test failures​" that I wrote about in a separate thread
late last week prompted me to re-review the patch and add more test
cases.  Those in turn exposed a bug in the adjust_range_for_overflow
function involving types of the same precision but different sign
where converting an unsigned range with an upper bound in excess of
the directive's TYPE_MAX would incorrectly accept the converted range
even though the new upper bound was less than the lower bound.

The updated  patch corrects this oversight.  In addition, it adjusts
the handling of the obscure corner case of zero precision and zero
argument which results in zero bytes (except in some even more
obscure cases involving some flags for some conversions).  For
instance:

  printf ("%.0i", 0);

results in zero bytes, but

  printf ("%+.0i", 0);

results in 1 byte (and prints '+').  This is tracked in bug 78606.

Although the differences between the approved patch and the update
are very small I repost it in case one of you would like to double
check them.  If not I'll commit the updated patch later in the week.

Martin

gcc-78622.diff


PR middle-end/78622 - -Wformat-length/-fprintf-return-value incorrect with 
overflow/wrapping
PR middle-end78606 - -Wformat-length/-fprintf-return-value incorrect for %+.0i 
and %.0o with zero value

gcc/ChangeLog:

PR middle-end/78622
* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
rather than res.bounded.
(get_width_and_precision): Set precision to -1 when negative.
(adjust_range_for_overflow): New function.
(format_integer): Correct the handling of the space, plus, and pound
flags, and the special case of zero precision.
Always set res.bounded to true unless either precision or width
is specified and unknown.
Call adjust_range_for_overflow.
Avoid use zero as the shortest value when precision is specified
but unknown.
(format_directive): Remove vestigial quoting.  Always inform of
argument value or range when it's available.
(add_bytes): Correct the computation of boundrange used to
decide whether a warning is of a "maybe" or "defnitely" kind.

gcc/testsuite/ChangeLog:

PR middle-end/78622
* gcc.c-torture/execute/pr78622.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined
behavior inadvertently introduced in a previous commit.  Tighten
up final checking.
* gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity.
Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-5.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and
add a final optimization check.
* gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases.
* gcc.dg/tree-ssa/pr78622.c: New test.


OK.

jeff



Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-12 Thread Jeff Law

On 12/07/2016 12:17 PM, Martin Sebor wrote:

OK.  So is the hangup really a problem in how the return type is
documented?  I parsed the comment as essentially saying we return true
if the range gets adjusted in any way -- thus a sign change in the first
block would qualify, but we returned false which seemed inconsistent.

Looking at it again, what I think it's saying is we're returning true
only for a subset of adjustments to the range, ie, those cases where the
postcondition does not hold.  Correct?


Correct.  Would changing the description of the return value to
this make it clearer?

   Return true when the range has been adjusted to the full unsigned
   range of DIRTYPE, [0, DIRTYPE_MAX], false otherwise.
Yea, that does help.  I see you've got an updated version posted.  Let 
me take a final (?) looksie now that I have a better understanding of 
the return value.  I think that was the biggest stumbling block on my side.


Jeff



Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-11 Thread Martin Sebor

So I think the return value needs a bit of clarification here.  Take
guidance from our discussion on this thread.

OK with that fixed.

jeff


The "strange test failures​" that I wrote about in a separate thread
late last week prompted me to re-review the patch and add more test
cases.  Those in turn exposed a bug in the adjust_range_for_overflow
function involving types of the same precision but different sign
where converting an unsigned range with an upper bound in excess of
the directive's TYPE_MAX would incorrectly accept the converted range
even though the new upper bound was less than the lower bound.

The updated  patch corrects this oversight.  In addition, it adjusts
the handling of the obscure corner case of zero precision and zero
argument which results in zero bytes (except in some even more
obscure cases involving some flags for some conversions).  For
instance:

  printf ("%.0i", 0);

results in zero bytes, but

  printf ("%+.0i", 0);

results in 1 byte (and prints '+').  This is tracked in bug 78606.

Although the differences between the approved patch and the update
are very small I repost it in case one of you would like to double
check them.  If not I'll commit the updated patch later in the week.

Martin
PR middle-end/78622 - -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping
PR middle-end78606 - -Wformat-length/-fprintf-return-value incorrect for %+.0i and %.0o with zero value

gcc/ChangeLog:

	PR middle-end/78622
	* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
	rather than res.bounded.
	(get_width_and_precision): Set precision to -1 when negative.
	(adjust_range_for_overflow): New function.
	(format_integer): Correct the handling of the space, plus, and pound
	flags, and the special case of zero precision.
	Always set res.bounded to true unless either precision or width
	is specified and unknown.
	Call adjust_range_for_overflow.
	Avoid use zero as the shortest value when precision is specified
	but unknown.
	(format_directive): Remove vestigial quoting.  Always inform of
	argument value or range when it's available.
	(add_bytes): Correct the computation of boundrange used to
	decide whether a warning is of a "maybe" or "defnitely" kind.

gcc/testsuite/ChangeLog:

	PR middle-end/78622
	* gcc.c-torture/execute/pr78622.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined
	behavior inadvertently introduced in a previous commit.  Tighten
	up final checking.
	* gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity.
	Add test cases.
	* gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-5.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and
	add a final optimization check.
	* gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases.
	* gcc.dg/tree-ssa/pr78622.c: New test.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 99a635a..905917d 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -637,7 +637,7 @@ min_bytes_remaining (unsigned HOST_WIDE_INT navail, const format_result &res)
   if (HOST_WIDE_INT_MAX <= navail)
 return navail;
 
-  if (1 < warn_format_length || res.bounded)
+  if (1 < warn_format_length || res.knownrange)
 {
   /* At level 2, or when all directives output an exact number
 	 of bytes or when their arguments were bounded by known
@@ -785,7 +785,7 @@ get_width_and_precision (const conversion_spec &spec,
 	{
 	  prec = tree_to_shwi (spec.star_precision);
 	  if (prec < 0)
-	prec = 0;
+	prec = -1;
 	}
   else
 	prec = HOST_WIDE_INT_MIN;
@@ -795,6 +795,69 @@ get_width_and_precision (const conversion_spec &spec,
   *pprec = prec;
 }
 
+/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
+   argument, due to the conversion from either *ARGMIN or *ARGMAX to
+   the type of the directive's formal argument it's possible for both
+   to result in the same number of bytes or a range of bytes that's
+   less than the number of bytes that would result from formatting
+   some other value in the range [*ARGMIN, *ARGMAX].  This can be
+   determined by checking for the actual argument being in the range
+   of the type of the directive.  If it isn't it must be assumed to
+   take on the full range of the directive's type.
+   Return true when the range has been adjusted to the full unsigned
+   range of DIRTYPE, or [0, DIRTYPE_MAX], and false otherwise.  */
+
+static bool
+adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
+{
+  tree argtype = TREE_TYPE (*argmin);
+  unsigned argprec = TYPE_PRECISION (argtype);
+  unsigned dirprec = TYPE_PRECISION (dirtype);
+
+  /* If the actual argument and the directive's argument have the same
+ precision and sign there can be no overflow and so there is nothing
+ t

Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-07 Thread Martin Sebor

On 12/07/2016 11:43 AM, Jeff Law wrote:

On 12/02/2016 03:54 PM, Martin Sebor wrote:

Thanks for looking at this!  I realize it's dense code and not easy
to make sense out of.

+
+  if (TREE_CODE (*argmin) == INTEGER_CST
+  && TREE_CODE (*argmax) == INTEGER_CST
+  && (dirprec >= argprec
+  || integer_zerop (int_const_binop (RSHIFT_EXPR,
+ int_const_binop (MINUS_EXPR,
+  *argmax,
+  *argmin),
+ size_int (dirprec)
+  {
+*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0,
false);
+*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0,
false);
+return false;
+  }

So in this case we're not changing the values, we're just converting
them to a equal or wider type, right?  Thus no adjustment (what about a
sign change?  Is that an adjustment?) and we return false per the
function's specifications.


This casts the value to the destination type, so it may result in
different values.  The important postcondition it preserves is that
the difference ARGMAX - ARGMIN is less than or equal to TYPE_MAX of
the directive's unsigned TYPE.  (If it isn't, the range cannot be
converted.)  This lets us take, say:

  int f (int i)
  {
if (i < 1024 || 1033 < i)
   i = 1024;

return snprintf (0, 0, "%hhi", i);
  }

and fold the function call into 1 because (signed char)1024 is 0 and
(signed char)1033 is 9, and every other value in that same original
range is also in the same range after the conversion.  We know it's
safe because (1033 - 1024 <= UCHAR_MAX) holds.

But the code in question is guarded by dirprec >= argprec.  Thus aren't
we converting to an equal or wider type?  In the case of converting to a
wider type, ISTM the values won't change and thus we return false.


I just saw your other response after I typed this up (thanks!) but
so just to close the loop:

You're right, when converting to a wider type the values won't change.

The guard is an OR expression:

  dirprec >= argprec
  OR
  0 == ((ARGMAX - ARGMIN) >> TYPE_PRECISION (dirtype))

and so when (dirprec < argprec) holds the values might change if
the other operand is true.  In both of these cases the function
returns false to indicate that the original range the arguments
were in hasn't changed as a result.



If we are converting to the same sized type, but with a different
signedness, then the values will have been adjusted and ISTM we ought to
be returning true in that case.  But the code actually returns false.

I must be missing something here.


The return value of the function is only used to decide the phrasing
of the notes printed after warnings and what values to include in
them.  Returning false means that the notes will say something like
"directive argument in the range [ARGMIN, ARGMAX]" where the values
are those determined by VRP.  Returning true means that the note
will instead say "using the range [TYPE_MIN, 0] for directive
argument." This distinction will (I hope) let the user know that
the second case is less accurate than the first.  It's somewhat
subtle but I think useful to understand why the checker decided
to warn.  Once I'm done with all my work I'd like to document this
in more  detail on the Wiki.



What about overflows when either argmin or argmax won't fit into dirtype
without an overflow?  What's the right behavior there?


That's fine just as long as the property above holds.

I think the algorithm works but the code came from tree-vrp where
there are all sorts of additional checks for some infinities which
VRP distinguishes from type limits like SCHAR_MIN and _MAX.  I don't
fully understand those and so I'd feel better if someone who does
double checked this code.

That's what prompted my question about overflows.  It was a general
concern after reading the VRP code.  I did not have a specific case in
mind that would be mis-handled.


Okay.


+  tree dirmin = TYPE_MIN_VALUE (dirtype);
+  tree dirmax = TYPE_MAX_VALUE (dirtype);

From this point onward argmin/argmax are either not integers or we're
doing a narrowing conversion, right?  In both cases the fact that we're
doing a narrowing conversion constrains the range


Argmin and argmax are always integers.  The rest of the function
handles the case where the postcondition above doesn't hold (e.g.,
in function g above).

OK.  So is the hangup really a problem in how the return type is
documented?  I parsed the comment as essentially saying we return true
if the range gets adjusted in any way -- thus a sign change in the first
block would qualify, but we returned false which seemed inconsistent.

Looking at it again, what I think it's saying is we're returning true
only for a subset of adjustments to the range, ie, those cases where the
postcondition does not hold.  Correct?


Correct.  Would changing the description of the return value to
this make it clearer?

   Return true when the range has been adjuste

Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-07 Thread Jeff Law

On 12/05/2016 08:43 PM, Martin Sebor wrote:


Martin

$ cat b.c && /build/gcc-78622/gcc/xgcc -B /build/gcc-78622/gcc -O2 -S
-Wall -Wextra -Wpedantic b.c
char d[1];

void f (int i)
{
  if (i < 1024 || 1033 < i) i = 1024;

  __builtin_sprintf (d + 1, "%hhi", i);
}

void g (int i)
{
  if (i < 1024 || 3456 < i) i = 1024;

  __builtin_sprintf (d + 1, "%hhi", i);
}

b.c: In function ‘f’:
b.c:7:30: warning: ‘%hhi’ directive writing 1 byte into a region of size
0 [-Wformat-length=]
   __builtin_sprintf (d + 1, "%hhi", i);
  ^~~~
b.c:7:29: note: directive argument in the range [1024, 1033]
   __builtin_sprintf (d + 1, "%hhi", i);
 ^~
b.c:7:3: note: format output 2 bytes into a destination of size 0
   __builtin_sprintf (d + 1, "%hhi", i);
   ^~~~
b.c: In function ‘g’:
b.c:14:30: warning: ‘%hhi’ directive writing between 1 and 4 bytes into
a region of size 0 [-Wformat-length=]
   __builtin_sprintf (d + 1, "%hhi", i);
  ^~~~
b.c:14:29: note: using the range [0, -128] for directive argument
   __builtin_sprintf (d + 1, "%hhi", i);
 ^~
b.c:14:3: note: format output between 2 and 5 bytes into a destination
of size 0
   __builtin_sprintf (d + 1, "%hhi", i);
   ^~~~


gcc-78622.diff


PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value 
incorrect with overflow/wrapping

gcc/ChangeLog:

PR middle-end/78622
* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
rather than res.bounded.
(adjust_range_for_overflow): New function.
(format_integer): Always set res.bounded to true unless either
precision or width is specified and unknown.
Call adjust_range_for_overflow.
(format_directive): Remove vestigial quoting.  Always inform of
argument value or range when it's available.
(add_bytes): Correct the computation of boundrange used to
decide whether a warning is of a "maybe" or "defnitely" kind.

gcc/testsuite/ChangeLog:

PR middle-end/78622
* gcc.c-torture/execute/pr78622.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined
behavior inadvertently introduced in a previous commit.  Tighten
up final checking.
* gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity.
Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-5.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and
add a final optimization check.
* gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 99a635a..a888f55 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -795,6 +795,59 @@ get_width_and_precision (const conversion_spec &spec,
   *pprec = prec;
 }

+/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
+   argument, due to the conversion from either *ARGMIN or *ARGMAX to
+   the type of the directive's formal argument it's possible for both
+   to result in the same number of bytes or a range of bytes that's
+   less than the number of bytes that would result from formatting
+   some other value in the range [*ARGMIN, *ARGMAX].  This can be
+   determined by checking for the actual argument being in the range
+   of the type of the directive.  If it isn't it must be assumed to
+   take on the full range of the directive's type.
+   Return true when the range has been adjusted to the range of
+   DIRTYPE, false otherwise.  */
So I think the return value needs a bit of clarification here.  Take 
guidance from our discussion on this thread.


OK with that fixed.

jeff


Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-07 Thread Jeff Law

On 12/02/2016 03:54 PM, Martin Sebor wrote:

Thanks for looking at this!  I realize it's dense code and not easy
to make sense out of.

+
+  if (TREE_CODE (*argmin) == INTEGER_CST
+  && TREE_CODE (*argmax) == INTEGER_CST
+  && (dirprec >= argprec
+  || integer_zerop (int_const_binop (RSHIFT_EXPR,
+ int_const_binop (MINUS_EXPR,
+  *argmax,
+  *argmin),
+ size_int (dirprec)
+  {
+*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0,
false);
+*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0,
false);
+return false;
+  }

So in this case we're not changing the values, we're just converting
them to a equal or wider type, right?  Thus no adjustment (what about a
sign change?  Is that an adjustment?) and we return false per the
function's specifications.


This casts the value to the destination type, so it may result in
different values.  The important postcondition it preserves is that
the difference ARGMAX - ARGMIN is less than or equal to TYPE_MAX of
the directive's unsigned TYPE.  (If it isn't, the range cannot be
converted.)  This lets us take, say:

  int f (int i)
  {
if (i < 1024 || 1033 < i)
   i = 1024;

return snprintf (0, 0, "%hhi", i);
  }

and fold the function call into 1 because (signed char)1024 is 0 and
(signed char)1033 is 9, and every other value in that same original
range is also in the same range after the conversion.  We know it's
safe because (1033 - 1024 <= UCHAR_MAX) holds.
But the code in question is guarded by dirprec >= argprec.  Thus aren't 
we converting to an equal or wider type?  In the case of converting to a 
wider type, ISTM the values won't change and thus we return false.


If we are converting to the same sized type, but with a different 
signedness, then the values will have been adjusted and ISTM we ought to 
be returning true in that case.  But the code actually returns false.


I must be missing something here.








What about overflows when either argmin or argmax won't fit into dirtype
without an overflow?  What's the right behavior there?


That's fine just as long as the property above holds.

I think the algorithm works but the code came from tree-vrp where
there are all sorts of additional checks for some infinities which
VRP distinguishes from type limits like SCHAR_MIN and _MAX.  I don't
fully understand those and so I'd feel better if someone who does
double checked this code.
That's what prompted my question about overflows.  It was a general 
concern after reading the VRP code.  I did not have a specific case in 
mind that would be mis-handled.








+
+  tree dirmin = TYPE_MIN_VALUE (dirtype);
+  tree dirmax = TYPE_MAX_VALUE (dirtype);

From this point onward argmin/argmax are either not integers or we're
doing a narrowing conversion, right?  In both cases the fact that we're
doing a narrowing conversion constrains the range


Argmin and argmax are always integers.  The rest of the function
handles the case where the postcondition above doesn't hold (e.g.,
in function g above).
OK.  So is the hangup really a problem in how the return type is 
documented?  I parsed the comment as essentially saying we return true 
if the range gets adjusted in any way -- thus a sign change in the first 
block would qualify, but we returned false which seemed inconsistent.


Looking at it again, what I think it's saying is we're returning true 
only for a subset of adjustments to the range, ie, those cases where the 
postcondition does not hold.  Correct?








+
+  if (TYPE_UNSIGNED (dirtype))
+{
+  *argmin = dirmin;
+  *argmax = dirmax;
+}
+  else
+{
+  *argmin = integer_zero_node;
+  *argmax = dirmin;
+}

Should this be dirmax? Am I missing something here?


It's dirmin because for a signed type, due to the sign, it results
in more bytes on output than dirmin would.  (E.g., with SCHAR_MIN
= -128 and SCHAR_MAX = 127 it's four bytes vs three.)

Ah.  I understand that.  THanks for clarifying.

jeff


Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-05 Thread Martin Sebor

On 12/05/2016 01:26 PM, Jakub Jelinek wrote:

Hi!

On Thu, Dec 01, 2016 at 07:31:18PM -0700, Martin Sebor wrote:

+static bool
+adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
+{
+  if (TYPE_UNSIGNED (dirtype))
+{
+  *argmin = dirmin;
+  *argmax = dirmax;
+}
+  else
+{
+  *argmin = integer_zero_node;
+  *argmax = dirmin;
+}


I still don't really like this mixing of ranges of values and picking of values
which result in shortest and longest representation, it is confusing and
will be a maintainance nightmare.

IMHO much cleaner is first figure out the range the argument (in argtype)
has.  I.e. look at VR_RANGE and if it is missing, perhaps find out another
argtype and in any case, use TYPE_{MIN,MAX}_VALUE (argtype) as the range.
I think that should probably be the range presented to the user in
diagnostics (i.e. res.arg{min,max}).

Next step is to adjust this range for the case where dirtype is different
from argtype.  This should be done regardless of what way you get the first
range from (whether from VR_RANGE or VR_VARYING etc.).  The result of this
still should be a range of values in dirtype.

And the last step should be to pick the values from that range which
has shortest and longest representation.  For unsigned dirtype
that are the bounds of the range from earlier step, for signed dirtype
something different (if both bounds are >= 0, then also just those bounds,
if both bounds are < 0, then the bounds swapped, otherwise 0 as minimum,
then e.g. try both bounds what has longer representation, or take some short
path e.g. if abs of the negative bound is >= the positive bound, then use
the negative bound as longest, otherwise try both).


I take this as your confirmation that the function does do the right
thing.  If not, or if you can't confirm that, it would be helpful if
you could let me know so that I can ask Richard to look it over.

The pass already does pretty much what you describe but I was able
to simplify the format_integer function somewhat and also arrange
for the informational note mentioning the range of argument values
to more closely reflect what I think you suggest.

With the attached patch the following example produces the output
below.  In the first function, because the pass uses the actual
argument range unchanged, the warning prints the exact byte count
and the note the original range.  In the second function, the range
must be adjusted to that of the directive's type, and the warning
prints a range of bytes and the note says "using the range [0, -128]"
to indicate that it used a different range than the range of
the actual argument.  It might be possible to change the note to
say something like "using the range of type 'signed char' or
something like that, to make it even clearer that the whole range
of the type is being used.  But these further tweaks should be made
independently of this patch, perhaps as part bug 77696 if and when
I get to it.

Martin

$ cat b.c && /build/gcc-78622/gcc/xgcc -B /build/gcc-78622/gcc -O2 -S 
-Wall -Wextra -Wpedantic b.c

char d[1];

void f (int i)
{
  if (i < 1024 || 1033 < i) i = 1024;

  __builtin_sprintf (d + 1, "%hhi", i);
}

void g (int i)
{
  if (i < 1024 || 3456 < i) i = 1024;

  __builtin_sprintf (d + 1, "%hhi", i);
}

b.c: In function ‘f’:
b.c:7:30: warning: ‘%hhi’ directive writing 1 byte into a region of size 
0 [-Wformat-length=]

   __builtin_sprintf (d + 1, "%hhi", i);
  ^~~~
b.c:7:29: note: directive argument in the range [1024, 1033]
   __builtin_sprintf (d + 1, "%hhi", i);
 ^~
b.c:7:3: note: format output 2 bytes into a destination of size 0
   __builtin_sprintf (d + 1, "%hhi", i);
   ^~~~
b.c: In function ‘g’:
b.c:14:30: warning: ‘%hhi’ directive writing between 1 and 4 bytes into 
a region of size 0 [-Wformat-length=]

   __builtin_sprintf (d + 1, "%hhi", i);
  ^~~~
b.c:14:29: note: using the range [0, -128] for directive argument
   __builtin_sprintf (d + 1, "%hhi", i);
 ^~
b.c:14:3: note: format output between 2 and 5 bytes into a destination 
of size 0

   __builtin_sprintf (d + 1, "%hhi", i);
   ^~~~

PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping

gcc/ChangeLog:

	PR middle-end/78622
	* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
	rather than res.bounded.
	(adjust_range_for_overflow): New function.
	(format_integer): Always set res.bounded to true unless either
	precision or width is specified and unknown.
	Call adjust_range_for_overflow.
	(format_directive): Remove vestigial quoting.  Always inform of
	argument value or range when it's available.
	(add_bytes): Correct the computation of boundrange used to
	decide whether a warning is of a "maybe" or "defnitely" kind.

gcc/testsuite/ChangeLog:

	PR middle-end

Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-05 Thread Jakub Jelinek
Hi!

On Thu, Dec 01, 2016 at 07:31:18PM -0700, Martin Sebor wrote:
> +static bool
> +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
> +{
> +  if (TYPE_UNSIGNED (dirtype))
> +{
> +  *argmin = dirmin;
> +  *argmax = dirmax;
> +}
> +  else
> +{
> +  *argmin = integer_zero_node;
> +  *argmax = dirmin;
> +}

I still don't really like this mixing of ranges of values and picking of values
which result in shortest and longest representation, it is confusing and
will be a maintainance nightmare.

IMHO much cleaner is first figure out the range the argument (in argtype)
has.  I.e. look at VR_RANGE and if it is missing, perhaps find out another
argtype and in any case, use TYPE_{MIN,MAX}_VALUE (argtype) as the range.
I think that should probably be the range presented to the user in
diagnostics (i.e. res.arg{min,max}).

Next step is to adjust this range for the case where dirtype is different
from argtype.  This should be done regardless of what way you get the first
range from (whether from VR_RANGE or VR_VARYING etc.).  The result of this
still should be a range of values in dirtype.

And the last step should be to pick the values from that range which
has shortest and longest representation.  For unsigned dirtype
that are the bounds of the range from earlier step, for signed dirtype
something different (if both bounds are >= 0, then also just those bounds,
if both bounds are < 0, then the bounds swapped, otherwise 0 as minimum,
then e.g. try both bounds what has longer representation, or take some short
path e.g. if abs of the negative bound is >= the positive bound, then use
the negative bound as longest, otherwise try both).

Jakub


Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-02 Thread Martin Sebor

Thanks for looking at this!  I realize it's dense code and not easy
to make sense out of.


PR middle-end/78622 - [7 Regression]
-Wformat-length/-fprintf-return-value incorrect with overflow/wrapping

gcc/ChangeLog:

PR middle-end/78622
* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
rather than res.bounded.
(adjust_range_for_overflow): New function.
(format_integer): Always set res.bounded to true unless either
precision or width is specified and unknown.
Call adjust_range_for_overflow.
(format_directive): Remove vestigial quoting.
(add_bytes): Correct the computation of boundrange used to
decide whether a warning is of a "maybe" or "defnitely" kind.

s/defnitely/definitely/


Will fix.


+/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
+   argument, due to the conversion from either *ARGMIN or *ARGMAX to
+   the type of the directive's formal argument it's possible for both
+   to result in the same number of bytes or a range of bytes that's
+   less than the number of bytes that would result from formatting
+   some other value in the range [*ARGMIN, *ARGMAX].  This can be
+   determined by checking for the actual argument being in the range
+   of the type of the directive.  If it isn't it must be assumed to
+   take on the full range of the directive's type.
+   Return true when the range has been adjusted to the range of
+   DIRTYPE, false otherwise.  */

I wish I'd counted how many times I read that.


Let me see if I can word it better.




+
+static bool
+adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
+{
+  unsigned argprec = TYPE_PRECISION (TREE_TYPE (*argmin));
+  unsigned dirprec = TYPE_PRECISION (dirtype);
+
+  /* The logic here was inspired/lifted from  the CONVERT_EXPR_CODE_P
+ branch in the extract_range_from_unary_expr function in tree-vrp.c.
+  */

Formatting it.  If the comment-close won't fit, then line wrap prior to
the last word in the comment.


Okay.  I didn't remember what the exact rules were.


+
+  if (TREE_CODE (*argmin) == INTEGER_CST
+  && TREE_CODE (*argmax) == INTEGER_CST
+  && (dirprec >= argprec
+  || integer_zerop (int_const_binop (RSHIFT_EXPR,
+ int_const_binop (MINUS_EXPR,
+  *argmax,
+  *argmin),
+ size_int (dirprec)
+  {
+*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0,
false);
+*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0,
false);
+return false;
+  }

So in this case we're not changing the values, we're just converting
them to a equal or wider type, right?  Thus no adjustment (what about a
sign change?  Is that an adjustment?) and we return false per the
function's specifications.


This casts the value to the destination type, so it may result in
different values.  The important postcondition it preserves is that
the difference ARGMAX - ARGMIN is less than or equal to TYPE_MAX of
the directive's unsigned TYPE.  (If it isn't, the range cannot be
converted.)  This lets us take, say:

  int f (int i)
  {
if (i < 1024 || 1033 < i)
   i = 1024;

return snprintf (0, 0, "%hhi", i);
  }

and fold the function call into 1 because (signed char)1024 is 0 and
(signed char)1033 is 9, and every other value in that same original
range is also in the same range after the conversion.  We know it's
safe because (1033 - 1024 <= UCHAR_MAX) holds.  But in in this:

  int g (int i)
  {
if (i < 1024 || 1289 < i)
   i = 1024;

return snprintf (0, 0, "%hhi", i);
  }

even though (signed char)1024 is 0 and (signed char)1289 is also 9
like above, since (1289 - 1024) is 265 and thus greater than UCHAR_MAX,
folding the result wouldn't be safe (for example, (sighed char)1034
yields 10).

I picture this as a window bounded by the range of the directive's
type sliding within another window bounded by the argument's type:

  [<-[...dirtype...]--->]

the outer brackets delimit the range of the argument and the inner
ones the directive's.  The smaller window can slide left and right
and it can shrink but it can't be wider that the directive's type's
range.


What about overflows when either argmin or argmax won't fit into dirtype
without an overflow?  What's the right behavior there?


That's fine just as long as the property above holds.

I think the algorithm works but the code came from tree-vrp where
there are all sorts of additional checks for some infinities which
VRP distinguishes from type limits like SCHAR_MIN and _MAX.  I don't
fully understand those and so I'd feel better if someone who does
double checked this code.




+
+  tree dirmin = TYPE_MIN_VALUE (dirtype);
+  tree dirmax = TYPE_MAX_VALUE (dirtype);

From this point onward argmin/argmax are either not integers or we're
doing a narrowing conversion, right?  In both cases the fact that we're
doing a narrowing conversion constrains th

Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-02 Thread Jeff Law

On 12/01/2016 07:31 PM, Martin Sebor wrote:

On 12/01/2016 02:15 AM, Jakub Jelinek wrote:

On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote:

Isn't this too simplistic?  I mean, if you have say dirtype of signed
char
and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg
has range 32, 64, while I think your routine will yield -128, 127 (well,
0 as min and -128 as max as that is what you return for signed type).

Can't you subtract argmax - argmin (best just in wide_int, no need to
build
trees), and use what you have just for the case where that number
doesn't
fit into the narrower precision, otherwise if argmin - (dirtype) argmin
== argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype)
argmax
as the range, and in case that it crosses a boundary figure if you
can do
anything than the above?  Guess all cases of signed/unsigned dirtype
and/or
argtype need to be considered.


Richard noted that you could have a look at CONVERT_EXPR_CODE_P
handling in extract_range_from_unary_expr.  I think it is the
  || (vr0.type == VR_RANGE
  && integer_zerop (int_const_binop (RSHIFT_EXPR,
   int_const_binop (MINUS_EXPR, vr0.max, vr0.min),
 size_int (TYPE_PRECISION (outer_type)))
part that is important here for the narrowing conversion.


Thanks, that was a helpful suggestion!  Attached is an update that
follows the vrp approach.  I assume the infinity stuff is specific
to the VRP pass and not something I need to worry about here.

Martin

PS To your earlier question, argmin and argmax have the same meaning
in the added helper function as in its caller.

gcc-78622.diff


PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value 
incorrect with overflow/wrapping

gcc/ChangeLog:

PR middle-end/78622
* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
rather than res.bounded.
(adjust_range_for_overflow): New function.
(format_integer): Always set res.bounded to true unless either
precision or width is specified and unknown.
Call adjust_range_for_overflow.
(format_directive): Remove vestigial quoting.
(add_bytes): Correct the computation of boundrange used to
decide whether a warning is of a "maybe" or "defnitely" kind.

s/defnitely/definitely/



gcc/testsuite/ChangeLog:

PR middle-end/78622
* gcc.c-torture/execute/pr78622.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined
behavior inadvertently introduced in a previous commit.  Tighten
up final checking.
* gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity.
Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and
add a final optimization check.
* gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases.





diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 99a635a..95a8710 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -795,6 +795,59 @@ get_width_and_precision (const conversion_spec &spec,
   *pprec = prec;
 }

+/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
+   argument, due to the conversion from either *ARGMIN or *ARGMAX to
+   the type of the directive's formal argument it's possible for both
+   to result in the same number of bytes or a range of bytes that's
+   less than the number of bytes that would result from formatting
+   some other value in the range [*ARGMIN, *ARGMAX].  This can be
+   determined by checking for the actual argument being in the range
+   of the type of the directive.  If it isn't it must be assumed to
+   take on the full range of the directive's type.
+   Return true when the range has been adjusted to the range of
+   DIRTYPE, false otherwise.  */

I wish I'd counted how many times I read that.


+
+static bool
+adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
+{
+  unsigned argprec = TYPE_PRECISION (TREE_TYPE (*argmin));
+  unsigned dirprec = TYPE_PRECISION (dirtype);
+
+  /* The logic here was inspired/lifted from  the CONVERT_EXPR_CODE_P
+ branch in the extract_range_from_unary_expr function in tree-vrp.c.
+  */
Formatting it.  If the comment-close won't fit, then line wrap prior to 
the last word in the comment.




+
+  if (TREE_CODE (*argmin) == INTEGER_CST
+  && TREE_CODE (*argmax) == INTEGER_CST
+  && (dirprec >= argprec
+ || integer_zerop (int_const_binop (RSHIFT_EXPR,
+int_const_binop (MINUS_EXPR,
+ *argmax,
+ *argmin),
+size_int (dirprec)
+  {
+

Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-01 Thread Martin Sebor

On 12/01/2016 02:15 AM, Jakub Jelinek wrote:

On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote:

Isn't this too simplistic?  I mean, if you have say dirtype of signed char
and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg
has range 32, 64, while I think your routine will yield -128, 127 (well,
0 as min and -128 as max as that is what you return for signed type).

Can't you subtract argmax - argmin (best just in wide_int, no need to build
trees), and use what you have just for the case where that number doesn't
fit into the narrower precision, otherwise if argmin - (dirtype) argmin
== argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax
as the range, and in case that it crosses a boundary figure if you can do
anything than the above?  Guess all cases of signed/unsigned dirtype and/or
argtype need to be considered.


Richard noted that you could have a look at CONVERT_EXPR_CODE_P
handling in extract_range_from_unary_expr.  I think it is the
  || (vr0.type == VR_RANGE
  && integer_zerop (int_const_binop (RSHIFT_EXPR,
   int_const_binop (MINUS_EXPR, vr0.max, vr0.min),
 size_int (TYPE_PRECISION (outer_type)))
part that is important here for the narrowing conversion.


Thanks, that was a helpful suggestion!  Attached is an update that
follows the vrp approach.  I assume the infinity stuff is specific
to the VRP pass and not something I need to worry about here.

Martin

PS To your earlier question, argmin and argmax have the same meaning
in the added helper function as in its caller.
PR middle-end/78622 - [7 Regression] -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping

gcc/ChangeLog:

	PR middle-end/78622
	* gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
	rather than res.bounded.
	(adjust_range_for_overflow): New function.
	(format_integer): Always set res.bounded to true unless either
	precision or width is specified and unknown.
	Call adjust_range_for_overflow.
	(format_directive): Remove vestigial quoting.
	(add_bytes): Correct the computation of boundrange used to
	decide whether a warning is of a "maybe" or "defnitely" kind.

gcc/testsuite/ChangeLog:

	PR middle-end/78622
	* gcc.c-torture/execute/pr78622.c: New test.
	* gcc.dg/tree-ssa/builtin-sprintf-2.c: Remove "benign" undefined
	behavior inadvertently introduced in a previous commit.  Tighten
	up final checking.
	* gcc.dg/tree-ssa/builtin-sprintf-5.c: Rename macros for clarity.
	Add test cases.
	* gcc.dg/tree-ssa/builtin-sprintf-6.c: Add test cases.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
	* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: Remove xfails and
	add a final optimization check.
	* gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 99a635a..95a8710 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -637,7 +637,7 @@ min_bytes_remaining (unsigned HOST_WIDE_INT navail, const format_result &res)
   if (HOST_WIDE_INT_MAX <= navail)
 return navail;
 
-  if (1 < warn_format_length || res.bounded)
+  if (1 < warn_format_length || res.knownrange)
 {
   /* At level 2, or when all directives output an exact number
 	 of bytes or when their arguments were bounded by known
@@ -795,6 +795,59 @@ get_width_and_precision (const conversion_spec &spec,
   *pprec = prec;
 }
 
+/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
+   argument, due to the conversion from either *ARGMIN or *ARGMAX to
+   the type of the directive's formal argument it's possible for both
+   to result in the same number of bytes or a range of bytes that's
+   less than the number of bytes that would result from formatting
+   some other value in the range [*ARGMIN, *ARGMAX].  This can be
+   determined by checking for the actual argument being in the range
+   of the type of the directive.  If it isn't it must be assumed to
+   take on the full range of the directive's type.
+   Return true when the range has been adjusted to the range of
+   DIRTYPE, false otherwise.  */
+
+static bool
+adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
+{
+  unsigned argprec = TYPE_PRECISION (TREE_TYPE (*argmin));
+  unsigned dirprec = TYPE_PRECISION (dirtype);
+
+  /* The logic here was inspired/lifted from  the CONVERT_EXPR_CODE_P
+ branch in the extract_range_from_unary_expr function in tree-vrp.c.
+  */
+
+  if (TREE_CODE (*argmin) == INTEGER_CST
+  && TREE_CODE (*argmax) == INTEGER_CST
+  && (dirprec >= argprec
+	  || integer_zerop (int_const_binop (RSHIFT_EXPR,
+	 int_const_binop (MINUS_EXPR,
+			  *argmax,
+			  *argmin),
+	 size_int (dirprec)
+  {
+*argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0, false);
+*argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0, false);
+return false;
+  }
+
+  tree dirmin

Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-12-01 Thread Jakub Jelinek
On Thu, Dec 01, 2016 at 08:26:47AM +0100, Jakub Jelinek wrote:
> Isn't this too simplistic?  I mean, if you have say dirtype of signed char
> and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg
> has range 32, 64, while I think your routine will yield -128, 127 (well,
> 0 as min and -128 as max as that is what you return for signed type).
> 
> Can't you subtract argmax - argmin (best just in wide_int, no need to build
> trees), and use what you have just for the case where that number doesn't
> fit into the narrower precision, otherwise if argmin - (dirtype) argmin
> == argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax
> as the range, and in case that it crosses a boundary figure if you can do
> anything than the above?  Guess all cases of signed/unsigned dirtype and/or
> argtype need to be considered.

Richard noted that you could have a look at CONVERT_EXPR_CODE_P
handling in extract_range_from_unary_expr.  I think it is the
  || (vr0.type == VR_RANGE
  && integer_zerop (int_const_binop (RSHIFT_EXPR,
   int_const_binop (MINUS_EXPR, vr0.max, vr0.min),
 size_int (TYPE_PRECISION (outer_type)))
part that is important here for the narrowing conversion.

Jakub


Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)

2016-11-30 Thread Jakub Jelinek
On Wed, Nov 30, 2016 at 08:26:04PM -0700, Martin Sebor wrote:
> @@ -795,6 +795,43 @@ get_width_and_precision (const conversion_spec &spec,
>*pprec = prec;
>  }
>  
> +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
> +   argument, due to the conversion from either *ARGMIN or *ARGMAX to
> +   the type of the directive's formal argument it's possible for both
> +   to result in the same number of bytes or a range of bytes that's
> +   less than the number of bytes that would result from formatting
> +   some other value in the range [*ARGMIN, *ARGMAX].  This can be
> +   determined by checking for the actual argument being in the range
> +   of the type of the directive.  If it isn't it must be assumed to
> +   take on the full range of the directive's type.
> +   Return true when the range has been adjusted, false otherwise.  */
> +
> +static bool
> +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
> +{
> +  tree dirmin = TYPE_MIN_VALUE (dirtype);
> +  tree dirmax = TYPE_MAX_VALUE (dirtype);
> +
> +  if (tree_int_cst_lt (*argmin, dirmin)
> +  || tree_int_cst_lt (dirmax, *argmin)
> +  || tree_int_cst_lt (*argmax, dirmin)
> +  || tree_int_cst_lt (dirmax, *argmax))
> +{
> +  if (TYPE_UNSIGNED (dirtype))
> + {
> +   *argmin = dirmin;
> +   *argmax = dirmax;
> + }
> +  else
> + {
> +   *argmin = integer_zero_node;
> +   *argmax = dirmin;
> + }
> +  return true;
> +}

Isn't this too simplistic?  I mean, if you have say dirtype of signed char
and argmin say 4096 + 32 and argmax say 4096 + 64, (signed char) arg
has range 32, 64, while I think your routine will yield -128, 127 (well,
0 as min and -128 as max as that is what you return for signed type).

Can't you subtract argmax - argmin (best just in wide_int, no need to build
trees), and use what you have just for the case where that number doesn't
fit into the narrower precision, otherwise if argmin - (dirtype) argmin
== argmax - (dirtype) argmax, just use (dirtype) argmin and (dirtype) argmax
as the range, and in case that it crosses a boundary figure if you can do
anything than the above?  Guess all cases of signed/unsigned dirtype and/or
argtype need to be considered.

Also, is argmin and argmax in this case the actual range (what should go
into res.arg{min,max}), or the values with shortest/longest representation?
Wouldn't it be better to always compute the range of values that can be
printed and only later on (after all VR_RANGE and VR_VARYING handling)
transform that into the number with shortest/longest representation in that
range?  Perhaps even using different variable names for the latter would
make things clearer (argshortest, arglongest or whatever).

Jakub