Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-12-02 Thread Jeff Law

On 11/29/2016 08:22 PM, Martin Sebor wrote:

That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.

So would it make sense to just init/fini the b_o_s framework in your
pass and for builtin expansion?


I think that should work for the sprintf checking.  Let me test it.
We can deal with the memxxx and strxxx patch (53562) independently
if you prefer.


Attached is a modified patch that calls {init,fini}_object_sizes()
from the gimple-ssa-sprintf pass instead.

While this works fine, I do like the approach of making the calls
in a single function better because it makes for a more robust API.
Decoupling the init/fini calls from the compute_object_size()
function that depends on them having been made makes the API easier
to accidentally misuse by calling one while forgetting to call one
or both of the other two.

Martin


gcc-78245.diff


PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically 
allocated buffer

gcc/testsuite/ChangeLog:

PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

PR middle-end/78245
* gimple-ssa-sprintf.c (get_destination_size): Call
compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(internal_object_size): New function.
(compute_builtin_object_size): Call internal_object_size.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (fini_object_sizes): Declare.

@@ -664,6 +665,18 @@ compute_builtin_object_size (tree ptr, int 
object_size_type,
   return *psize != unknown[object_size_type];
 }

+/* Compute __builtin_object_size value for PTR and set *PSIZE to
+   the resulting value.  OBJECT_SIZE_TYPE is the second argument
+   to __builtin_object_size.  Return true on success and false
+   when the object size could not be determined.  */
+
+bool
+compute_builtin_object_size (tree ptr, int object_size_type,
+unsigned HOST_WIDE_INT *psize)
+{
+  return internal_object_size (ptr, object_size_type, psize);
+}
Is this wrapper still necessary now that we've pulled the init/fini 
routines out?  Seems like it shouldn't be.



I think that's the only issue left.  If the wrapper is needed, then the 
patch is fine.  If the wrapper isn't, then a patch with the wrapper 
removed is pre-approved after the usual testing.


jeff


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-12-02 Thread Jeff Law

On 11/29/2016 08:22 PM, Martin Sebor wrote:

That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.

So would it make sense to just init/fini the b_o_s framework in your
pass and for builtin expansion?


I think that should work for the sprintf checking.  Let me test it.
We can deal with the memxxx and strxxx patch (53562) independently
if you prefer.


Attached is a modified patch that calls {init,fini}_object_sizes()
from the gimple-ssa-sprintf pass instead.

While this works fine, I do like the approach of making the calls
in a single function better because it makes for a more robust API.
Decoupling the init/fini calls from the compute_object_size()
function that depends on them having been made makes the API easier
to accidentally misuse by calling one while forgetting to call one
or both of the other two.
It's not ideal, nor is the prospect of caching and potentially not 
invaliding properly.


I've started tackling these kinds problems by wrapping everything into a 
class with suitable ctors/dtors and methods.  With everything locked 
down inside a class, the only way to access the subsystem is by 
instantiating a suitable object (which obviously gives us control over 
init/fini).  The problem then boils down to not having that instantiated 
object live across passes, which usually isn't a problem in GCC :-)


I suggested it as a possibility, but wasn't going to demand it without 
knowing much more about the code in tree-object-size and how well it 
could be encapsulated.


ANyway, I'll take another look at the patch.  My recollection was that 
the only issue at hand was the init/fini/caching aspects.



jeff



Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-29 Thread Martin Sebor

That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.

So would it make sense to just init/fini the b_o_s framework in your
pass and for builtin expansion?


I think that should work for the sprintf checking.  Let me test it.
We can deal with the memxxx and strxxx patch (53562) independently
if you prefer.


Attached is a modified patch that calls {init,fini}_object_sizes()
from the gimple-ssa-sprintf pass instead.

While this works fine, I do like the approach of making the calls
in a single function better because it makes for a more robust API.
Decoupling the init/fini calls from the compute_object_size()
function that depends on them having been made makes the API easier
to accidentally misuse by calling one while forgetting to call one
or both of the other two.

Martin

PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer

gcc/testsuite/ChangeLog:

	PR middle-end/78245
	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

	PR middle-end/78245
	* gimple-ssa-sprintf.c (get_destination_size): Call
	compute_object_size.
	* tree-object-size.c (addr_object_size): Adjust.
	(pass_through_call): Adjust.
	(internal_object_size): New function.
	(compute_builtin_object_size): Call internal_object_size.
	(pass_object_sizes::execute): Adjust.
	* tree-object-size.h (fini_object_sizes): Declare.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index ead8b0e..34b3723 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -2466,6 +2466,9 @@ get_destination_size (tree dest)
  a member array as opposed to the whole enclosing object), otherwise
  use type-zero object size to determine the size of the enclosing
  object (the function fails without optimization in this type).  */
+
+  init_object_sizes ();
+
   int ost = optimize > 0;
   unsigned HOST_WIDE_INT size;
   if (compute_builtin_object_size (dest, ost, &size))
@@ -2800,6 +2803,8 @@ pass_sprintf_length::execute (function *fun)
 	}
 }
 
+  fini_object_sizes ();
+
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
index 8d97fa8..9874332 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
@@ -1,5 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* Verify that all sprintf built-ins detect overflow involving directives
+   with non-constant arguments known to be constrained by some range of
+   values, and even when writing into dynamically allocated buffers.
+   -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass,
+   otherwise -O1 is sufficient.  */
 
 #ifndef LINE
 #  define LINE 0
@@ -7,18 +12,26 @@
 
 #define bos(x) __builtin_object_size (x, 0)
 
-#define T(bufsize, fmt, ...)		\
-do {\
-  if (!LINE || __LINE__ == LINE)	\
-	{\
-	  char *d = (char *)__builtin_malloc (bufsize);			\
-	  __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__);	\
-	  sink (d);			\
-	}\
-} while (0)
+/* Defined (and redefined) to the allocation function to use, either
+   malloc, or alloca, or a VLA.  */
+#define ALLOC(p, n)   (p) = __builtin_malloc (n)
 
-void
-sink (void*);
+/* Defined (and redefined) to the sprintf function to exercise.  */
+#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...)		\
+  __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__)
+
+#define T(bufsize, fmt, ...)\
+  do {			\
+if (!LINE || __LINE__ == LINE)			\
+  {			\
+	char *d;	\
+	ALLOC (d, bufsize);\
+	TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__);	\
+	sink (d);	\
+  }			\
+  } while (0)
+
+void sink (void*);
 
 /* Identity function to verify that the checker figures out the value
of the operand even when it's not constant (i.e., makes use of
@@ -232,3 +245,88 @@ void test_sprintf_chk_range_sshort (signed short *a, signed short *b)
   T ( 4, "%i",  Ra (998,  999));
   T ( 4, "%i",  Ra (999, 1000)); /* { dg-warning "may write a terminating nul past the end of the destination" } */
 }
+
+/* Exercise ordinary sprintf with malloc.  */
+#undef TEST_SPRINTF
+#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...)	\
+  __builtin_sprintf (d, fmt, __VA_ARGS__)
+
+void test_sprintf_malloc (const char *s, const char *t)
+{
+#define x x ()
+
+  T (1, "%-s", x ? "" : "1");   /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? "1" : "");   /* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? s : "1");/* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ? "1" : s);/* { dg-warning "nul past the end" } */
+  T (1, "%-s", x ?

Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Martin Sebor

On 11/23/2016 01:30 PM, Jeff Law wrote:

On 11/23/2016 01:09 PM, Martin Sebor wrote:


I hadn't thought of extending the gimple-ssa-sprintf pass to all
the memxxx and strxxx builtins.  The _chk functions are already
being handled in builtins.c so calling compute_builtin_object_size
for the non-checking ones there and detecting overflow in those
was an easy and, I had hoped, non-controversial enhancement to make.
In a discussion of bug 77784 (handled in the patch for bug 53562)
Jakub also expressed a preference for some of the diagnostics
staying in builtins.c.

I'm just trying to understand how the pieces fit together.  I wasn't
aware of Jakub's desire to keep them in builtins.c.


After thinking about it a bit it does seem that having all the size
and buffer overflow checking (though not necessarily BOS itself) in
the same place or pass would make sense.


I also suspect that the answer to your question is yes.  Range
information is pretty bad in the gimple-ssa-sprintf pass (it looks
like it runs after EVRP but before VRP).  Maybe the pass should run
after VRP?

Let's investigate this separately rather than draw in additional
potential issues.  But I do think this is worth investigation.


Sounds good.





That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.

So would it make sense to just init/fini the b_o_s framework in your
pass and for builtin expansion?


I think that should work for the sprintf checking.  Let me test it.
We can deal with the memxxx and strxxx patch (53562) independently
if you prefer.

Thanks
Martin


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Jeff Law

On 11/23/2016 01:09 PM, Martin Sebor wrote:


I hadn't thought of extending the gimple-ssa-sprintf pass to all
the memxxx and strxxx builtins.  The _chk functions are already
being handled in builtins.c so calling compute_builtin_object_size
for the non-checking ones there and detecting overflow in those
was an easy and, I had hoped, non-controversial enhancement to make.
In a discussion of bug 77784 (handled in the patch for bug 53562)
Jakub also expressed a preference for some of the diagnostics
staying in builtins.c.
I'm just trying to understand how the pieces fit together.  I wasn't 
aware of Jakub's desire to keep them in builtins.c.




I also suspect that the answer to your question is yes.  Range
information is pretty bad in the gimple-ssa-sprintf pass (it looks
like it runs after EVRP but before VRP).  Maybe the pass should run
after VRP?
Let's investigate this separately rather than draw in additional 
potential issues.  But I do think this is worth investigation.




That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.
So would it make sense to just init/fini the b_o_s framework in your 
pass and for builtin expansion?



PS If I understand what you are suggesting this would mean
extending the gimple-ssa-sprintf pass to the memxxx and strxxx
functions and running the pass later, after VRP.
That was my original thought, but I'm certainly not deeply vested in it 
-- it was primarily to avoid initializing b_o_s an extra time.


Jeff



Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Martin Sebor

On 11/23/2016 12:47 PM, Jeff Law wrote:

On 11/23/2016 12:32 PM, Martin Sebor wrote:


My worry here would be a hash collision.  Then we'd be using object
sizes from the wrong function.


Ah, right, that might explain the ICE I just noticed during Ada
bootstrap.  Is there some other way to uniquely identify a function?
A DECL_UID maybe?

DECL_UID would be your best bet.  But ISTM that trying to avoid
invocations by reusing data from prior passes is likely to be more
fragile than recomputing on a per-pass basis -- as long as the number of
times we need this stuff is small (as I suspect it is).





Isn't the goal here to be able to get format-length warnings when there
aren't explicit calls to _b_o_s in the IL?   Can't you initialize the
object-size framework at the start of your pass and tear it down when
your pass is complete?  You could do that by exporting the init/fini
routines and calling them directly, or by wrapping that in a class and
instantiating the class when you need it.

That would avoid having to worry about the GC system entirely since you
wouldn't have stuff living across passes.


Yes, that is the immediate goal of this patch.  Beyond it, though,
I would like to call this function from anywhere, including during
expansion (as is done in my patch for bug 53562 and related).

But why not detect the builtins during your pass and check there.  ie, I
don't see why we necessarily need to have checking and expansion
intertwined together.  Maybe I'm missing something.  Is there something
that makes it inherently easier or better to implement checking during
builtin expansion?


I hadn't thought of extending the gimple-ssa-sprintf pass to all
the memxxx and strxxx builtins.  The _chk functions are already
being handled in builtins.c so calling compute_builtin_object_size
for the non-checking ones there and detecting overflow in those
was an easy and, I had hoped, non-controversial enhancement to make.
In a discussion of bug 77784 (handled in the patch for bug 53562)
Jakub also expressed a preference for some of the diagnostics
staying in builtins.c.

I also suspect that the answer to your question is yes.  Range
information is pretty bad in the gimple-ssa-sprintf pass (it looks
like it runs after EVRP but before VRP).  Maybe the pass should run
after VRP?

That said, I defer to you on how to proceed here.  I'm prepared
to do the work(*) but I do worry about jeopardizing the chances
of this patch and the others making it into 7.0.

Martin

PS If I understand what you are suggesting this would mean
extending the gimple-ssa-sprintf pass to the memxxx and strxxx
functions and running the pass later, after VRP.


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Jeff Law

On 11/23/2016 12:32 PM, Martin Sebor wrote:


My worry here would be a hash collision.  Then we'd be using object
sizes from the wrong function.


Ah, right, that might explain the ICE I just noticed during Ada
bootstrap.  Is there some other way to uniquely identify a function?
A DECL_UID maybe?
DECL_UID would be your best bet.  But ISTM that trying to avoid 
invocations by reusing data from prior passes is likely to be more 
fragile than recomputing on a per-pass basis -- as long as the number of 
times we need this stuff is small (as I suspect it is).






Isn't the goal here to be able to get format-length warnings when there
aren't explicit calls to _b_o_s in the IL?   Can't you initialize the
object-size framework at the start of your pass and tear it down when
your pass is complete?  You could do that by exporting the init/fini
routines and calling them directly, or by wrapping that in a class and
instantiating the class when you need it.

That would avoid having to worry about the GC system entirely since you
wouldn't have stuff living across passes.


Yes, that is the immediate goal of this patch.  Beyond it, though,
I would like to call this function from anywhere, including during
expansion (as is done in my patch for bug 53562 and related).
But why not detect the builtins during your pass and check there.  ie, I 
don't see why we necessarily need to have checking and expansion 
intertwined together.  Maybe I'm missing something.  Is there something 
that makes it inherently easier or better to implement checking during 
builtin expansion?


Jeff



Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Martin Sebor

On 11/23/2016 12:10 PM, Jeff Law wrote:

On 11/23/2016 11:26 AM, Martin Sebor wrote:

My only real concern here is that if we call compute_builtin_object_size
without having initialized the passes, then we initialize, compute, then
finalize.  Subsequent calls will go through the same process -- the key
being each one re-computes the internal state which might get expensive.

Wouldn't it just make more sense to pull up the init/fini calls, either
explicitly (which likely means externalizing the init/fini routines) or
by wrapping all this stuff in a class and instantiating a suitable
object?

I think the answer to your memory management question is that internal
state is likely not marked as a GC root and thus if you get a GC call
pieces of internal state are not seen as reachable, but you still can
reference them.  ie, you end up with dangling pointers.

Usually all you'd have to do is mark them so that gengtype will see
them.  Bitmaps, trees, rtl, are all good examples.  So marking the
bitmap would look like:

static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where
someone has an object expected to be live across passes, but it isn't
reachable because someone failed to register a GC root.


Thanks.  Attached is an updated patch that avoids flushing the computed
data until the current function changes, and tells the garbage collector
not to collect it.  I haven't finished bootstrapping and regtesting it
yet but running it through Valgrind doesn't show any errors.  Please
let me know if this is what you had in mind.

Thanks
Martin

gcc-78245.diff


PR middle-end/78245 - missing -Wformat-length on an overflow of a
dynamically allocated buffer

gcc/testsuite/ChangeLog:

PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

PR middle-end/78245
* gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY.
(get_destination_size): Call compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(compute_object_size, internal_object_size): New functions.
(compute_builtin_object_size): Call internal_object_size.
(init_object_sizes): Initialize computed bitmap so the garbage
collector knows about it.
(fini_object_sizes): Clear the computed bitmap when non-null.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 1317ad7..39c32e3 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -72,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct
object_size_info *, tree,
the subobject (innermost array or field with address taken).
object_sizes[2] is lower bound for number of bytes till the end of
the object and object_sizes[3] lower bound for subobject.  */
-static vec object_sizes[4];
+static GTY (()) vec object_sizes[4];

I don't think this needs a GTY marker.


 /* Bitmaps what object sizes have been computed already.  */
-static bitmap computed[4];
+static GTY (()) bitmap computed[4];

This is the one you probably needed :-)



+/* Like compute_builtin_object_size but intended to be called
+   without a corresponding __builtin_object_size in the program.  */
+
+bool
+compute_object_size (tree ptr, int object_size_type,
+ unsigned HOST_WIDE_INT *psize)
+{
+  static unsigned lastfunchash;
+  unsigned curfunchash
+= IDENTIFIER_HASH_VALUE (DECL_NAME (current_function_decl));
+
+  /* Initialize the internal data structures for each new function
+ and keep the computed data around for any subsequent calls to
+ compute_object_size.  */
+  if (curfunchash != lastfunchash)

My worry here would be a hash collision.  Then we'd be using object
sizes from the wrong function.


Ah, right, that might explain the ICE I just noticed during Ada
bootstrap.  Is there some other way to uniquely identify a function?
A DECL_UID maybe?



Isn't the goal here to be able to get format-length warnings when there
aren't explicit calls to _b_o_s in the IL?   Can't you initialize the
object-size framework at the start of your pass and tear it down when
your pass is complete?  You could do that by exporting the init/fini
routines and calling them directly, or by wrapping that in a class and
instantiating the class when you need it.

That would avoid having to worry about the GC system entirely since you
wouldn't have stuff living across passes.


Yes, that is the immediate goal of this patch.  Beyond it, though,
I would like to call this function from anywhere, including during
expansion (as is done in my patch for bug 53562 and related).

Martin

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00896.html


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Jeff Law

On 11/23/2016 11:26 AM, Martin Sebor wrote:

My only real concern here is that if we call compute_builtin_object_size
without having initialized the passes, then we initialize, compute, then
finalize.  Subsequent calls will go through the same process -- the key
being each one re-computes the internal state which might get expensive.

Wouldn't it just make more sense to pull up the init/fini calls, either
explicitly (which likely means externalizing the init/fini routines) or
by wrapping all this stuff in a class and instantiating a suitable
object?

I think the answer to your memory management question is that internal
state is likely not marked as a GC root and thus if you get a GC call
pieces of internal state are not seen as reachable, but you still can
reference them.  ie, you end up with dangling pointers.

Usually all you'd have to do is mark them so that gengtype will see
them.  Bitmaps, trees, rtl, are all good examples.  So marking the
bitmap would look like:

static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where
someone has an object expected to be live across passes, but it isn't
reachable because someone failed to register a GC root.


Thanks.  Attached is an updated patch that avoids flushing the computed
data until the current function changes, and tells the garbage collector
not to collect it.  I haven't finished bootstrapping and regtesting it
yet but running it through Valgrind doesn't show any errors.  Please
let me know if this is what you had in mind.

Thanks
Martin

gcc-78245.diff


PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically 
allocated buffer

gcc/testsuite/ChangeLog:

PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

PR middle-end/78245
* gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY.
(get_destination_size): Call compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(compute_object_size, internal_object_size): New functions.
(compute_builtin_object_size): Call internal_object_size.
(init_object_sizes): Initialize computed bitmap so the garbage
collector knows about it.
(fini_object_sizes): Clear the computed bitmap when non-null.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 1317ad7..39c32e3 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -72,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct 
object_size_info *, tree,
the subobject (innermost array or field with address taken).
object_sizes[2] is lower bound for number of bytes till the end of
the object and object_sizes[3] lower bound for subobject.  */
-static vec object_sizes[4];
+static GTY (()) vec object_sizes[4];

I don't think this needs a GTY marker.


 /* Bitmaps what object sizes have been computed already.  */
-static bitmap computed[4];
+static GTY (()) bitmap computed[4];

This is the one you probably needed :-)



+/* Like compute_builtin_object_size but intended to be called
+   without a corresponding __builtin_object_size in the program.  */
+
+bool
+compute_object_size (tree ptr, int object_size_type,
+unsigned HOST_WIDE_INT *psize)
+{
+  static unsigned lastfunchash;
+  unsigned curfunchash
+= IDENTIFIER_HASH_VALUE (DECL_NAME (current_function_decl));
+
+  /* Initialize the internal data structures for each new function
+ and keep the computed data around for any subsequent calls to
+ compute_object_size.  */
+  if (curfunchash != lastfunchash)
My worry here would be a hash collision.  Then we'd be using object 
sizes from the wrong function.



Isn't the goal here to be able to get format-length warnings when there 
aren't explicit calls to _b_o_s in the IL?   Can't you initialize the 
object-size framework at the start of your pass and tear it down when 
your pass is complete?  You could do that by exporting the init/fini 
routines and calling them directly, or by wrapping that in a class and 
instantiating the class when you need it.


That would avoid having to worry about the GC system entirely since you 
wouldn't have stuff living across passes.


Jeff


Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-23 Thread Martin Sebor

My only real concern here is that if we call compute_builtin_object_size
without having initialized the passes, then we initialize, compute, then
finalize.  Subsequent calls will go through the same process -- the key
being each one re-computes the internal state which might get expensive.

Wouldn't it just make more sense to pull up the init/fini calls, either
explicitly (which likely means externalizing the init/fini routines) or
by wrapping all this stuff in a class and instantiating a suitable object?

I think the answer to your memory management question is that internal
state is likely not marked as a GC root and thus if you get a GC call
pieces of internal state are not seen as reachable, but you still can
reference them.  ie, you end up with dangling pointers.

Usually all you'd have to do is mark them so that gengtype will see
them.  Bitmaps, trees, rtl, are all good examples.  So marking the
bitmap would look like:

static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where
someone has an object expected to be live across passes, but it isn't
reachable because someone failed to register a GC root.


Thanks.  Attached is an updated patch that avoids flushing the computed
data until the current function changes, and tells the garbage collector
not to collect it.  I haven't finished bootstrapping and regtesting it
yet but running it through Valgrind doesn't show any errors.  Please
let me know if this is what you had in mind.

Thanks
Martin
PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer

gcc/testsuite/ChangeLog:

	PR middle-end/78245
	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

	PR middle-end/78245
	* gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY.
	(get_destination_size): Call compute_object_size.
	* tree-object-size.c (addr_object_size): Adjust.
	(pass_through_call): Adjust.
	(compute_object_size, internal_object_size): New functions.
	(compute_builtin_object_size): Call internal_object_size.
	(init_object_sizes): Initialize computed bitmap so the garbage
	collector knows about it.
	(fini_object_sizes): Clear the computed bitmap when non-null.
	(pass_object_sizes::execute): Adjust.
	* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index ead8b0e..ea56570 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -2468,7 +2468,7 @@ get_destination_size (tree dest)
  object (the function fails without optimization in this type).  */
   int ost = optimize > 0;
   unsigned HOST_WIDE_INT size;
-  if (compute_builtin_object_size (dest, ost, &size))
+  if (compute_object_size (dest, ost, &size))
 return size;
 
   return HOST_WIDE_INT_M1U;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
index 8d97fa8..9874332 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
@@ -1,5 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* Verify that all sprintf built-ins detect overflow involving directives
+   with non-constant arguments known to be constrained by some range of
+   values, and even when writing into dynamically allocated buffers.
+   -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass,
+   otherwise -O1 is sufficient.  */
 
 #ifndef LINE
 #  define LINE 0
@@ -7,18 +12,26 @@
 
 #define bos(x) __builtin_object_size (x, 0)
 
-#define T(bufsize, fmt, ...)		\
-do {\
-  if (!LINE || __LINE__ == LINE)	\
-	{\
-	  char *d = (char *)__builtin_malloc (bufsize);			\
-	  __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__);	\
-	  sink (d);			\
-	}\
-} while (0)
+/* Defined (and redefined) to the allocation function to use, either
+   malloc, or alloca, or a VLA.  */
+#define ALLOC(p, n)   (p) = __builtin_malloc (n)
 
-void
-sink (void*);
+/* Defined (and redefined) to the sprintf function to exercise.  */
+#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...)		\
+  __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__)
+
+#define T(bufsize, fmt, ...)\
+  do {			\
+if (!LINE || __LINE__ == LINE)			\
+  {			\
+	char *d;	\
+	ALLOC (d, bufsize);\
+	TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__);	\
+	sink (d);	\
+  }			\
+  } while (0)
+
+void sink (void*);
 
 /* Identity function to verify that the checker figures out the value
of the operand even when it's not constant (i.e., makes use of
@@ -232,3 +245,88 @@ void test_sprintf_chk_range_sshort (signed short *a, signed short *b)
   T ( 4, "%i",  Ra (998,  

Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-22 Thread Jeff Law

On 11/08/2016 05:09 PM, Martin Sebor wrote:

The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise.  This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.

The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.

Martin

PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning.  That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function.  I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures.  Does anyone have any suggestions
how to make this work?  (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)

gcc-78245.diff


PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically 
allocated buffer

gcc/testsuite/ChangeLog:

PR middle-end/78245
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

PR middle-end/78245
* gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size.
* tree-object-size.c (addr_object_size): Adjust.
(pass_through_call): Adjust.
(compute_object_size, internal_object_size): New functions.
(compute_builtin_object_size): Call internal_object_size.
(pass_object_sizes::execute): Adjust.
* tree-object-size.h (compute_object_size): Declare.
Sorry.  Just not getting to many of the pre-stage1 close patches as fast 
as I'd like.


My only real concern here is that if we call compute_builtin_object_size 
without having initialized the passes, then we initialize, compute, then 
finalize.  Subsequent calls will go through the same process -- the key 
being each one re-computes the internal state which might get expensive.


Wouldn't it just make more sense to pull up the init/fini calls, either 
explicitly (which likely means externalizing the init/fini routines) or 
by wrapping all this stuff in a class and instantiating a suitable object?


I think the answer to your memory management question is that internal 
state is likely not marked as a GC root and thus if you get a GC call 
pieces of internal state are not seen as reachable, but you still can 
reference them.  ie, you end up with dangling pointers.


Usually all you'd have to do is mark them so that gengtype will see 
them.  Bitmaps, trees, rtl, are all good examples.  So marking the 
bitmap would look like:


static GTY (()) bitmap computed[4];

Or something like that.

You might try --enable-checking=yes,extra,gc,gcac

That will be slow, but is often helpful for tracking down cases where 
someone has an object expected to be live across passes, but it isn't 
reachable because someone failed to register a GC root.


Jeff


PING 2 [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-21 Thread Martin Sebor

Ping.  Still looking for a review of the patch below:

On 11/16/2016 10:33 AM, Martin Sebor wrote:

I'm looking for a review of the patch below:

  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00779.html

Thanks

On 11/08/2016 05:09 PM, Martin Sebor wrote:

The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise.  This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.

The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.

Martin

PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning.  That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function.  I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures.  Does anyone have any suggestions
how to make this work?  (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)






PING [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-16 Thread Martin Sebor

I'm looking for a review of the patch below:

  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00779.html

Thanks

On 11/08/2016 05:09 PM, Martin Sebor wrote:

The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise.  This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.

The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.

Martin

PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning.  That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function.  I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures.  Does anyone have any suggestions
how to make this work?  (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)




[PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)

2016-11-08 Thread Martin Sebor

The -Wformat-length checker relies on the compute_builtin_object_size
function to determine the size of the buffer it checks for overflow.
The function returns either a size computed by the tree-object-size
pass for objects referenced by the __builtin_object_size intrinsic
(if it's used in the program) or it tries to compute it for a small
subset of expressions otherwise.  This subset doesn't include objects
allocated by either malloc or alloca, and so for those the function
returns "unknown" or (size_t)-1 in the case of -Wformat-length.  As
a consequence, -Wformat-length is unable to detect overflows
involving such objects.

The attached patch adds a new function, compute_object_size, that
uses the existing algorithms to compute and return the sizes of
allocated objects as well, as if they were referenced by
__builtin_object_size in the program source, enabling the
-Wformat-length checker to detect more buffer overflows.

Martin

PS The function makes use of the init_function_sizes API that is
otherwise unused outside the tree-object-size pass to initialize
the internal structures, but then calls fini_object_sizes to
release them before returning.  That seems wasteful because
the size of the same object or one related to it might need
to computed again in the context of the same function.  I
experimented with allocating and releasing the structures only
when current_function_decl changes but that led to crashes.
I suspect I'm missing something about the management of memory
allocated for these structures.  Does anyone have any suggestions
how to make this work?  (Do I perhaps need to allocate them using
a special allocator so they don't get garbage collected?)
PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer

gcc/testsuite/ChangeLog:

	PR middle-end/78245
	* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests.

gcc/ChangeLog:

	PR middle-end/78245
	* gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size.
	* tree-object-size.c (addr_object_size): Adjust.
	(pass_through_call): Adjust.
	(compute_object_size, internal_object_size): New functions.
	(compute_builtin_object_size): Call internal_object_size.
	(pass_object_sizes::execute): Adjust.
	* tree-object-size.h (compute_object_size): Declare.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 3138ad3..f360711 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -2471,7 +2471,7 @@ get_destination_size (tree dest)
  object (the function fails without optimization in this type).  */
   int ost = optimize > 0;
   unsigned HOST_WIDE_INT size;
-  if (compute_builtin_object_size (dest, ost, &size))
+  if (compute_object_size (dest, ost, &size))
 return size;
 
   return HOST_WIDE_INT_M1U;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
index 8d97fa8..9874332 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
@@ -1,5 +1,10 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */
+/* Verify that all sprintf built-ins detect overflow involving directives
+   with non-constant arguments known to be constrained by some range of
+   values, and even when writing into dynamically allocated buffers.
+   -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass,
+   otherwise -O1 is sufficient.  */
 
 #ifndef LINE
 #  define LINE 0
@@ -7,18 +12,26 @@
 
 #define bos(x) __builtin_object_size (x, 0)
 
-#define T(bufsize, fmt, ...)		\
-do {\
-  if (!LINE || __LINE__ == LINE)	\
-	{\
-	  char *d = (char *)__builtin_malloc (bufsize);			\
-	  __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__);	\
-	  sink (d);			\
-	}\
-} while (0)
+/* Defined (and redefined) to the allocation function to use, either
+   malloc, or alloca, or a VLA.  */
+#define ALLOC(p, n)   (p) = __builtin_malloc (n)
 
-void
-sink (void*);
+/* Defined (and redefined) to the sprintf function to exercise.  */
+#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...)		\
+  __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__)
+
+#define T(bufsize, fmt, ...)\
+  do {			\
+if (!LINE || __LINE__ == LINE)			\
+  {			\
+	char *d;	\
+	ALLOC (d, bufsize);\
+	TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__);	\
+	sink (d);	\
+  }			\
+  } while (0)
+
+void sink (void*);
 
 /* Identity function to verify that the checker figures out the value
of the operand even when it's not constant (i.e., makes use of
@@ -232,3 +245,88 @@ void test_sprintf_chk_range_sshort (signed short *a, signed short *b)
   T ( 4, "%i",  Ra (998,  999));
   T ( 4, "%i",  Ra (999, 1000)); /* { dg-warning "may write a terminating nul past the end of