Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/29/2016 08:22 PM, Martin Sebor wrote: That said, I defer to you on how to proceed here. I'm prepared to do the work(*) but I do worry about jeopardizing the chances of this patch and the others making it into 7.0. So would it make sense to just init/fini the b_o_s framework in your pass and for builtin expansion? I think that should work for the sprintf checking. Let me test it. We can deal with the memxxx and strxxx patch (53562) independently if you prefer. Attached is a modified patch that calls {init,fini}_object_sizes() from the gimple-ssa-sprintf pass instead. While this works fine, I do like the approach of making the calls in a single function better because it makes for a more robust API. Decoupling the init/fini calls from the compute_object_size() function that depends on them having been made makes the API easier to accidentally misuse by calling one while forgetting to call one or both of the other two. Martin gcc-78245.diff PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer gcc/testsuite/ChangeLog: PR middle-end/78245 * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests. gcc/ChangeLog: PR middle-end/78245 * gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size. * tree-object-size.c (addr_object_size): Adjust. (pass_through_call): Adjust. (internal_object_size): New function. (compute_builtin_object_size): Call internal_object_size. (pass_object_sizes::execute): Adjust. * tree-object-size.h (fini_object_sizes): Declare. @@ -664,6 +665,18 @@ compute_builtin_object_size (tree ptr, int object_size_type, return *psize != unknown[object_size_type]; } +/* Compute __builtin_object_size value for PTR and set *PSIZE to + the resulting value. OBJECT_SIZE_TYPE is the second argument + to __builtin_object_size. Return true on success and false + when the object size could not be determined. */ + +bool +compute_builtin_object_size (tree ptr, int object_size_type, +unsigned HOST_WIDE_INT *psize) +{ + return internal_object_size (ptr, object_size_type, psize); +} Is this wrapper still necessary now that we've pulled the init/fini routines out? Seems like it shouldn't be. I think that's the only issue left. If the wrapper is needed, then the patch is fine. If the wrapper isn't, then a patch with the wrapper removed is pre-approved after the usual testing. jeff
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/29/2016 08:22 PM, Martin Sebor wrote: That said, I defer to you on how to proceed here. I'm prepared to do the work(*) but I do worry about jeopardizing the chances of this patch and the others making it into 7.0. So would it make sense to just init/fini the b_o_s framework in your pass and for builtin expansion? I think that should work for the sprintf checking. Let me test it. We can deal with the memxxx and strxxx patch (53562) independently if you prefer. Attached is a modified patch that calls {init,fini}_object_sizes() from the gimple-ssa-sprintf pass instead. While this works fine, I do like the approach of making the calls in a single function better because it makes for a more robust API. Decoupling the init/fini calls from the compute_object_size() function that depends on them having been made makes the API easier to accidentally misuse by calling one while forgetting to call one or both of the other two. It's not ideal, nor is the prospect of caching and potentially not invaliding properly. I've started tackling these kinds problems by wrapping everything into a class with suitable ctors/dtors and methods. With everything locked down inside a class, the only way to access the subsystem is by instantiating a suitable object (which obviously gives us control over init/fini). The problem then boils down to not having that instantiated object live across passes, which usually isn't a problem in GCC :-) I suggested it as a possibility, but wasn't going to demand it without knowing much more about the code in tree-object-size and how well it could be encapsulated. ANyway, I'll take another look at the patch. My recollection was that the only issue at hand was the init/fini/caching aspects. jeff
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
That said, I defer to you on how to proceed here. I'm prepared to do the work(*) but I do worry about jeopardizing the chances of this patch and the others making it into 7.0. So would it make sense to just init/fini the b_o_s framework in your pass and for builtin expansion? I think that should work for the sprintf checking. Let me test it. We can deal with the memxxx and strxxx patch (53562) independently if you prefer. Attached is a modified patch that calls {init,fini}_object_sizes() from the gimple-ssa-sprintf pass instead. While this works fine, I do like the approach of making the calls in a single function better because it makes for a more robust API. Decoupling the init/fini calls from the compute_object_size() function that depends on them having been made makes the API easier to accidentally misuse by calling one while forgetting to call one or both of the other two. Martin PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer gcc/testsuite/ChangeLog: PR middle-end/78245 * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests. gcc/ChangeLog: PR middle-end/78245 * gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size. * tree-object-size.c (addr_object_size): Adjust. (pass_through_call): Adjust. (internal_object_size): New function. (compute_builtin_object_size): Call internal_object_size. (pass_object_sizes::execute): Adjust. * tree-object-size.h (fini_object_sizes): Declare. diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index ead8b0e..34b3723 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -2466,6 +2466,9 @@ get_destination_size (tree dest) a member array as opposed to the whole enclosing object), otherwise use type-zero object size to determine the size of the enclosing object (the function fails without optimization in this type). */ + + init_object_sizes (); + int ost = optimize > 0; unsigned HOST_WIDE_INT size; if (compute_builtin_object_size (dest, ost, &size)) @@ -2800,6 +2803,8 @@ pass_sprintf_length::execute (function *fun) } } + fini_object_sizes (); + return 0; } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c index 8d97fa8..9874332 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c @@ -1,5 +1,10 @@ /* { dg-do compile } */ -/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */ +/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */ +/* Verify that all sprintf built-ins detect overflow involving directives + with non-constant arguments known to be constrained by some range of + values, and even when writing into dynamically allocated buffers. + -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass, + otherwise -O1 is sufficient. */ #ifndef LINE # define LINE 0 @@ -7,18 +12,26 @@ #define bos(x) __builtin_object_size (x, 0) -#define T(bufsize, fmt, ...) \ -do {\ - if (!LINE || __LINE__ == LINE) \ - {\ - char *d = (char *)__builtin_malloc (bufsize); \ - __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__); \ - sink (d); \ - }\ -} while (0) +/* Defined (and redefined) to the allocation function to use, either + malloc, or alloca, or a VLA. */ +#define ALLOC(p, n) (p) = __builtin_malloc (n) -void -sink (void*); +/* Defined (and redefined) to the sprintf function to exercise. */ +#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...) \ + __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__) + +#define T(bufsize, fmt, ...)\ + do { \ +if (!LINE || __LINE__ == LINE) \ + { \ + char *d; \ + ALLOC (d, bufsize);\ + TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__); \ + sink (d); \ + } \ + } while (0) + +void sink (void*); /* Identity function to verify that the checker figures out the value of the operand even when it's not constant (i.e., makes use of @@ -232,3 +245,88 @@ void test_sprintf_chk_range_sshort (signed short *a, signed short *b) T ( 4, "%i", Ra (998, 999)); T ( 4, "%i", Ra (999, 1000)); /* { dg-warning "may write a terminating nul past the end of the destination" } */ } + +/* Exercise ordinary sprintf with malloc. */ +#undef TEST_SPRINTF +#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...) \ + __builtin_sprintf (d, fmt, __VA_ARGS__) + +void test_sprintf_malloc (const char *s, const char *t) +{ +#define x x () + + T (1, "%-s", x ? "" : "1"); /* { dg-warning "nul past the end" } */ + T (1, "%-s", x ? "1" : ""); /* { dg-warning "nul past the end" } */ + T (1, "%-s", x ? s : "1");/* { dg-warning "nul past the end" } */ + T (1, "%-s", x ? "1" : s);/* { dg-warning "nul past the end" } */ + T (1, "%-s", x ?
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/23/2016 01:30 PM, Jeff Law wrote: On 11/23/2016 01:09 PM, Martin Sebor wrote: I hadn't thought of extending the gimple-ssa-sprintf pass to all the memxxx and strxxx builtins. The _chk functions are already being handled in builtins.c so calling compute_builtin_object_size for the non-checking ones there and detecting overflow in those was an easy and, I had hoped, non-controversial enhancement to make. In a discussion of bug 77784 (handled in the patch for bug 53562) Jakub also expressed a preference for some of the diagnostics staying in builtins.c. I'm just trying to understand how the pieces fit together. I wasn't aware of Jakub's desire to keep them in builtins.c. After thinking about it a bit it does seem that having all the size and buffer overflow checking (though not necessarily BOS itself) in the same place or pass would make sense. I also suspect that the answer to your question is yes. Range information is pretty bad in the gimple-ssa-sprintf pass (it looks like it runs after EVRP but before VRP). Maybe the pass should run after VRP? Let's investigate this separately rather than draw in additional potential issues. But I do think this is worth investigation. Sounds good. That said, I defer to you on how to proceed here. I'm prepared to do the work(*) but I do worry about jeopardizing the chances of this patch and the others making it into 7.0. So would it make sense to just init/fini the b_o_s framework in your pass and for builtin expansion? I think that should work for the sprintf checking. Let me test it. We can deal with the memxxx and strxxx patch (53562) independently if you prefer. Thanks Martin
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/23/2016 01:09 PM, Martin Sebor wrote: I hadn't thought of extending the gimple-ssa-sprintf pass to all the memxxx and strxxx builtins. The _chk functions are already being handled in builtins.c so calling compute_builtin_object_size for the non-checking ones there and detecting overflow in those was an easy and, I had hoped, non-controversial enhancement to make. In a discussion of bug 77784 (handled in the patch for bug 53562) Jakub also expressed a preference for some of the diagnostics staying in builtins.c. I'm just trying to understand how the pieces fit together. I wasn't aware of Jakub's desire to keep them in builtins.c. I also suspect that the answer to your question is yes. Range information is pretty bad in the gimple-ssa-sprintf pass (it looks like it runs after EVRP but before VRP). Maybe the pass should run after VRP? Let's investigate this separately rather than draw in additional potential issues. But I do think this is worth investigation. That said, I defer to you on how to proceed here. I'm prepared to do the work(*) but I do worry about jeopardizing the chances of this patch and the others making it into 7.0. So would it make sense to just init/fini the b_o_s framework in your pass and for builtin expansion? PS If I understand what you are suggesting this would mean extending the gimple-ssa-sprintf pass to the memxxx and strxxx functions and running the pass later, after VRP. That was my original thought, but I'm certainly not deeply vested in it -- it was primarily to avoid initializing b_o_s an extra time. Jeff
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/23/2016 12:47 PM, Jeff Law wrote: On 11/23/2016 12:32 PM, Martin Sebor wrote: My worry here would be a hash collision. Then we'd be using object sizes from the wrong function. Ah, right, that might explain the ICE I just noticed during Ada bootstrap. Is there some other way to uniquely identify a function? A DECL_UID maybe? DECL_UID would be your best bet. But ISTM that trying to avoid invocations by reusing data from prior passes is likely to be more fragile than recomputing on a per-pass basis -- as long as the number of times we need this stuff is small (as I suspect it is). Isn't the goal here to be able to get format-length warnings when there aren't explicit calls to _b_o_s in the IL? Can't you initialize the object-size framework at the start of your pass and tear it down when your pass is complete? You could do that by exporting the init/fini routines and calling them directly, or by wrapping that in a class and instantiating the class when you need it. That would avoid having to worry about the GC system entirely since you wouldn't have stuff living across passes. Yes, that is the immediate goal of this patch. Beyond it, though, I would like to call this function from anywhere, including during expansion (as is done in my patch for bug 53562 and related). But why not detect the builtins during your pass and check there. ie, I don't see why we necessarily need to have checking and expansion intertwined together. Maybe I'm missing something. Is there something that makes it inherently easier or better to implement checking during builtin expansion? I hadn't thought of extending the gimple-ssa-sprintf pass to all the memxxx and strxxx builtins. The _chk functions are already being handled in builtins.c so calling compute_builtin_object_size for the non-checking ones there and detecting overflow in those was an easy and, I had hoped, non-controversial enhancement to make. In a discussion of bug 77784 (handled in the patch for bug 53562) Jakub also expressed a preference for some of the diagnostics staying in builtins.c. I also suspect that the answer to your question is yes. Range information is pretty bad in the gimple-ssa-sprintf pass (it looks like it runs after EVRP but before VRP). Maybe the pass should run after VRP? That said, I defer to you on how to proceed here. I'm prepared to do the work(*) but I do worry about jeopardizing the chances of this patch and the others making it into 7.0. Martin PS If I understand what you are suggesting this would mean extending the gimple-ssa-sprintf pass to the memxxx and strxxx functions and running the pass later, after VRP.
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/23/2016 12:32 PM, Martin Sebor wrote: My worry here would be a hash collision. Then we'd be using object sizes from the wrong function. Ah, right, that might explain the ICE I just noticed during Ada bootstrap. Is there some other way to uniquely identify a function? A DECL_UID maybe? DECL_UID would be your best bet. But ISTM that trying to avoid invocations by reusing data from prior passes is likely to be more fragile than recomputing on a per-pass basis -- as long as the number of times we need this stuff is small (as I suspect it is). Isn't the goal here to be able to get format-length warnings when there aren't explicit calls to _b_o_s in the IL? Can't you initialize the object-size framework at the start of your pass and tear it down when your pass is complete? You could do that by exporting the init/fini routines and calling them directly, or by wrapping that in a class and instantiating the class when you need it. That would avoid having to worry about the GC system entirely since you wouldn't have stuff living across passes. Yes, that is the immediate goal of this patch. Beyond it, though, I would like to call this function from anywhere, including during expansion (as is done in my patch for bug 53562 and related). But why not detect the builtins during your pass and check there. ie, I don't see why we necessarily need to have checking and expansion intertwined together. Maybe I'm missing something. Is there something that makes it inherently easier or better to implement checking during builtin expansion? Jeff
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/23/2016 12:10 PM, Jeff Law wrote: On 11/23/2016 11:26 AM, Martin Sebor wrote: My only real concern here is that if we call compute_builtin_object_size without having initialized the passes, then we initialize, compute, then finalize. Subsequent calls will go through the same process -- the key being each one re-computes the internal state which might get expensive. Wouldn't it just make more sense to pull up the init/fini calls, either explicitly (which likely means externalizing the init/fini routines) or by wrapping all this stuff in a class and instantiating a suitable object? I think the answer to your memory management question is that internal state is likely not marked as a GC root and thus if you get a GC call pieces of internal state are not seen as reachable, but you still can reference them. ie, you end up with dangling pointers. Usually all you'd have to do is mark them so that gengtype will see them. Bitmaps, trees, rtl, are all good examples. So marking the bitmap would look like: static GTY (()) bitmap computed[4]; Or something like that. You might try --enable-checking=yes,extra,gc,gcac That will be slow, but is often helpful for tracking down cases where someone has an object expected to be live across passes, but it isn't reachable because someone failed to register a GC root. Thanks. Attached is an updated patch that avoids flushing the computed data until the current function changes, and tells the garbage collector not to collect it. I haven't finished bootstrapping and regtesting it yet but running it through Valgrind doesn't show any errors. Please let me know if this is what you had in mind. Thanks Martin gcc-78245.diff PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer gcc/testsuite/ChangeLog: PR middle-end/78245 * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests. gcc/ChangeLog: PR middle-end/78245 * gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY. (get_destination_size): Call compute_object_size. * tree-object-size.c (addr_object_size): Adjust. (pass_through_call): Adjust. (compute_object_size, internal_object_size): New functions. (compute_builtin_object_size): Call internal_object_size. (init_object_sizes): Initialize computed bitmap so the garbage collector knows about it. (fini_object_sizes): Clear the computed bitmap when non-null. (pass_object_sizes::execute): Adjust. * tree-object-size.h (compute_object_size): Declare. diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 1317ad7..39c32e3 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -72,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct object_size_info *, tree, the subobject (innermost array or field with address taken). object_sizes[2] is lower bound for number of bytes till the end of the object and object_sizes[3] lower bound for subobject. */ -static vec object_sizes[4]; +static GTY (()) vec object_sizes[4]; I don't think this needs a GTY marker. /* Bitmaps what object sizes have been computed already. */ -static bitmap computed[4]; +static GTY (()) bitmap computed[4]; This is the one you probably needed :-) +/* Like compute_builtin_object_size but intended to be called + without a corresponding __builtin_object_size in the program. */ + +bool +compute_object_size (tree ptr, int object_size_type, + unsigned HOST_WIDE_INT *psize) +{ + static unsigned lastfunchash; + unsigned curfunchash += IDENTIFIER_HASH_VALUE (DECL_NAME (current_function_decl)); + + /* Initialize the internal data structures for each new function + and keep the computed data around for any subsequent calls to + compute_object_size. */ + if (curfunchash != lastfunchash) My worry here would be a hash collision. Then we'd be using object sizes from the wrong function. Ah, right, that might explain the ICE I just noticed during Ada bootstrap. Is there some other way to uniquely identify a function? A DECL_UID maybe? Isn't the goal here to be able to get format-length warnings when there aren't explicit calls to _b_o_s in the IL? Can't you initialize the object-size framework at the start of your pass and tear it down when your pass is complete? You could do that by exporting the init/fini routines and calling them directly, or by wrapping that in a class and instantiating the class when you need it. That would avoid having to worry about the GC system entirely since you wouldn't have stuff living across passes. Yes, that is the immediate goal of this patch. Beyond it, though, I would like to call this function from anywhere, including during expansion (as is done in my patch for bug 53562 and related). Martin https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00896.html
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/23/2016 11:26 AM, Martin Sebor wrote: My only real concern here is that if we call compute_builtin_object_size without having initialized the passes, then we initialize, compute, then finalize. Subsequent calls will go through the same process -- the key being each one re-computes the internal state which might get expensive. Wouldn't it just make more sense to pull up the init/fini calls, either explicitly (which likely means externalizing the init/fini routines) or by wrapping all this stuff in a class and instantiating a suitable object? I think the answer to your memory management question is that internal state is likely not marked as a GC root and thus if you get a GC call pieces of internal state are not seen as reachable, but you still can reference them. ie, you end up with dangling pointers. Usually all you'd have to do is mark them so that gengtype will see them. Bitmaps, trees, rtl, are all good examples. So marking the bitmap would look like: static GTY (()) bitmap computed[4]; Or something like that. You might try --enable-checking=yes,extra,gc,gcac That will be slow, but is often helpful for tracking down cases where someone has an object expected to be live across passes, but it isn't reachable because someone failed to register a GC root. Thanks. Attached is an updated patch that avoids flushing the computed data until the current function changes, and tells the garbage collector not to collect it. I haven't finished bootstrapping and regtesting it yet but running it through Valgrind doesn't show any errors. Please let me know if this is what you had in mind. Thanks Martin gcc-78245.diff PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer gcc/testsuite/ChangeLog: PR middle-end/78245 * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests. gcc/ChangeLog: PR middle-end/78245 * gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY. (get_destination_size): Call compute_object_size. * tree-object-size.c (addr_object_size): Adjust. (pass_through_call): Adjust. (compute_object_size, internal_object_size): New functions. (compute_builtin_object_size): Call internal_object_size. (init_object_sizes): Initialize computed bitmap so the garbage collector knows about it. (fini_object_sizes): Clear the computed bitmap when non-null. (pass_object_sizes::execute): Adjust. * tree-object-size.h (compute_object_size): Declare. diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c index 1317ad7..39c32e3 100644 --- a/gcc/tree-object-size.c +++ b/gcc/tree-object-size.c @@ -72,10 +74,10 @@ static void check_for_plus_in_loops_1 (struct object_size_info *, tree, the subobject (innermost array or field with address taken). object_sizes[2] is lower bound for number of bytes till the end of the object and object_sizes[3] lower bound for subobject. */ -static vec object_sizes[4]; +static GTY (()) vec object_sizes[4]; I don't think this needs a GTY marker. /* Bitmaps what object sizes have been computed already. */ -static bitmap computed[4]; +static GTY (()) bitmap computed[4]; This is the one you probably needed :-) +/* Like compute_builtin_object_size but intended to be called + without a corresponding __builtin_object_size in the program. */ + +bool +compute_object_size (tree ptr, int object_size_type, +unsigned HOST_WIDE_INT *psize) +{ + static unsigned lastfunchash; + unsigned curfunchash += IDENTIFIER_HASH_VALUE (DECL_NAME (current_function_decl)); + + /* Initialize the internal data structures for each new function + and keep the computed data around for any subsequent calls to + compute_object_size. */ + if (curfunchash != lastfunchash) My worry here would be a hash collision. Then we'd be using object sizes from the wrong function. Isn't the goal here to be able to get format-length warnings when there aren't explicit calls to _b_o_s in the IL? Can't you initialize the object-size framework at the start of your pass and tear it down when your pass is complete? You could do that by exporting the init/fini routines and calling them directly, or by wrapping that in a class and instantiating the class when you need it. That would avoid having to worry about the GC system entirely since you wouldn't have stuff living across passes. Jeff
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
My only real concern here is that if we call compute_builtin_object_size without having initialized the passes, then we initialize, compute, then finalize. Subsequent calls will go through the same process -- the key being each one re-computes the internal state which might get expensive. Wouldn't it just make more sense to pull up the init/fini calls, either explicitly (which likely means externalizing the init/fini routines) or by wrapping all this stuff in a class and instantiating a suitable object? I think the answer to your memory management question is that internal state is likely not marked as a GC root and thus if you get a GC call pieces of internal state are not seen as reachable, but you still can reference them. ie, you end up with dangling pointers. Usually all you'd have to do is mark them so that gengtype will see them. Bitmaps, trees, rtl, are all good examples. So marking the bitmap would look like: static GTY (()) bitmap computed[4]; Or something like that. You might try --enable-checking=yes,extra,gc,gcac That will be slow, but is often helpful for tracking down cases where someone has an object expected to be live across passes, but it isn't reachable because someone failed to register a GC root. Thanks. Attached is an updated patch that avoids flushing the computed data until the current function changes, and tells the garbage collector not to collect it. I haven't finished bootstrapping and regtesting it yet but running it through Valgrind doesn't show any errors. Please let me know if this is what you had in mind. Thanks Martin PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer gcc/testsuite/ChangeLog: PR middle-end/78245 * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests. gcc/ChangeLog: PR middle-end/78245 * gimple-ssa-sprintf.c (object_sizes, computed): Declared GTY. (get_destination_size): Call compute_object_size. * tree-object-size.c (addr_object_size): Adjust. (pass_through_call): Adjust. (compute_object_size, internal_object_size): New functions. (compute_builtin_object_size): Call internal_object_size. (init_object_sizes): Initialize computed bitmap so the garbage collector knows about it. (fini_object_sizes): Clear the computed bitmap when non-null. (pass_object_sizes::execute): Adjust. * tree-object-size.h (compute_object_size): Declare. diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index ead8b0e..ea56570 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -2468,7 +2468,7 @@ get_destination_size (tree dest) object (the function fails without optimization in this type). */ int ost = optimize > 0; unsigned HOST_WIDE_INT size; - if (compute_builtin_object_size (dest, ost, &size)) + if (compute_object_size (dest, ost, &size)) return size; return HOST_WIDE_INT_M1U; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c index 8d97fa8..9874332 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c @@ -1,5 +1,10 @@ /* { dg-do compile } */ -/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */ +/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */ +/* Verify that all sprintf built-ins detect overflow involving directives + with non-constant arguments known to be constrained by some range of + values, and even when writing into dynamically allocated buffers. + -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass, + otherwise -O1 is sufficient. */ #ifndef LINE # define LINE 0 @@ -7,18 +12,26 @@ #define bos(x) __builtin_object_size (x, 0) -#define T(bufsize, fmt, ...) \ -do {\ - if (!LINE || __LINE__ == LINE) \ - {\ - char *d = (char *)__builtin_malloc (bufsize); \ - __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__); \ - sink (d); \ - }\ -} while (0) +/* Defined (and redefined) to the allocation function to use, either + malloc, or alloca, or a VLA. */ +#define ALLOC(p, n) (p) = __builtin_malloc (n) -void -sink (void*); +/* Defined (and redefined) to the sprintf function to exercise. */ +#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...) \ + __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__) + +#define T(bufsize, fmt, ...)\ + do { \ +if (!LINE || __LINE__ == LINE) \ + { \ + char *d; \ + ALLOC (d, bufsize);\ + TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__); \ + sink (d); \ + } \ + } while (0) + +void sink (void*); /* Identity function to verify that the checker figures out the value of the operand even when it's not constant (i.e., makes use of @@ -232,3 +245,88 @@ void test_sprintf_chk_range_sshort (signed short *a, signed short *b) T ( 4, "%i", Ra (998,
Re: [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
On 11/08/2016 05:09 PM, Martin Sebor wrote: The -Wformat-length checker relies on the compute_builtin_object_size function to determine the size of the buffer it checks for overflow. The function returns either a size computed by the tree-object-size pass for objects referenced by the __builtin_object_size intrinsic (if it's used in the program) or it tries to compute it for a small subset of expressions otherwise. This subset doesn't include objects allocated by either malloc or alloca, and so for those the function returns "unknown" or (size_t)-1 in the case of -Wformat-length. As a consequence, -Wformat-length is unable to detect overflows involving such objects. The attached patch adds a new function, compute_object_size, that uses the existing algorithms to compute and return the sizes of allocated objects as well, as if they were referenced by __builtin_object_size in the program source, enabling the -Wformat-length checker to detect more buffer overflows. Martin PS The function makes use of the init_function_sizes API that is otherwise unused outside the tree-object-size pass to initialize the internal structures, but then calls fini_object_sizes to release them before returning. That seems wasteful because the size of the same object or one related to it might need to computed again in the context of the same function. I experimented with allocating and releasing the structures only when current_function_decl changes but that led to crashes. I suspect I'm missing something about the management of memory allocated for these structures. Does anyone have any suggestions how to make this work? (Do I perhaps need to allocate them using a special allocator so they don't get garbage collected?) gcc-78245.diff PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer gcc/testsuite/ChangeLog: PR middle-end/78245 * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests. gcc/ChangeLog: PR middle-end/78245 * gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size. * tree-object-size.c (addr_object_size): Adjust. (pass_through_call): Adjust. (compute_object_size, internal_object_size): New functions. (compute_builtin_object_size): Call internal_object_size. (pass_object_sizes::execute): Adjust. * tree-object-size.h (compute_object_size): Declare. Sorry. Just not getting to many of the pre-stage1 close patches as fast as I'd like. My only real concern here is that if we call compute_builtin_object_size without having initialized the passes, then we initialize, compute, then finalize. Subsequent calls will go through the same process -- the key being each one re-computes the internal state which might get expensive. Wouldn't it just make more sense to pull up the init/fini calls, either explicitly (which likely means externalizing the init/fini routines) or by wrapping all this stuff in a class and instantiating a suitable object? I think the answer to your memory management question is that internal state is likely not marked as a GC root and thus if you get a GC call pieces of internal state are not seen as reachable, but you still can reference them. ie, you end up with dangling pointers. Usually all you'd have to do is mark them so that gengtype will see them. Bitmaps, trees, rtl, are all good examples. So marking the bitmap would look like: static GTY (()) bitmap computed[4]; Or something like that. You might try --enable-checking=yes,extra,gc,gcac That will be slow, but is often helpful for tracking down cases where someone has an object expected to be live across passes, but it isn't reachable because someone failed to register a GC root. Jeff
PING 2 [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
Ping. Still looking for a review of the patch below: On 11/16/2016 10:33 AM, Martin Sebor wrote: I'm looking for a review of the patch below: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00779.html Thanks On 11/08/2016 05:09 PM, Martin Sebor wrote: The -Wformat-length checker relies on the compute_builtin_object_size function to determine the size of the buffer it checks for overflow. The function returns either a size computed by the tree-object-size pass for objects referenced by the __builtin_object_size intrinsic (if it's used in the program) or it tries to compute it for a small subset of expressions otherwise. This subset doesn't include objects allocated by either malloc or alloca, and so for those the function returns "unknown" or (size_t)-1 in the case of -Wformat-length. As a consequence, -Wformat-length is unable to detect overflows involving such objects. The attached patch adds a new function, compute_object_size, that uses the existing algorithms to compute and return the sizes of allocated objects as well, as if they were referenced by __builtin_object_size in the program source, enabling the -Wformat-length checker to detect more buffer overflows. Martin PS The function makes use of the init_function_sizes API that is otherwise unused outside the tree-object-size pass to initialize the internal structures, but then calls fini_object_sizes to release them before returning. That seems wasteful because the size of the same object or one related to it might need to computed again in the context of the same function. I experimented with allocating and releasing the structures only when current_function_decl changes but that led to crashes. I suspect I'm missing something about the management of memory allocated for these structures. Does anyone have any suggestions how to make this work? (Do I perhaps need to allocate them using a special allocator so they don't get garbage collected?)
PING [PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
I'm looking for a review of the patch below: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00779.html Thanks On 11/08/2016 05:09 PM, Martin Sebor wrote: The -Wformat-length checker relies on the compute_builtin_object_size function to determine the size of the buffer it checks for overflow. The function returns either a size computed by the tree-object-size pass for objects referenced by the __builtin_object_size intrinsic (if it's used in the program) or it tries to compute it for a small subset of expressions otherwise. This subset doesn't include objects allocated by either malloc or alloca, and so for those the function returns "unknown" or (size_t)-1 in the case of -Wformat-length. As a consequence, -Wformat-length is unable to detect overflows involving such objects. The attached patch adds a new function, compute_object_size, that uses the existing algorithms to compute and return the sizes of allocated objects as well, as if they were referenced by __builtin_object_size in the program source, enabling the -Wformat-length checker to detect more buffer overflows. Martin PS The function makes use of the init_function_sizes API that is otherwise unused outside the tree-object-size pass to initialize the internal structures, but then calls fini_object_sizes to release them before returning. That seems wasteful because the size of the same object or one related to it might need to computed again in the context of the same function. I experimented with allocating and releasing the structures only when current_function_decl changes but that led to crashes. I suspect I'm missing something about the management of memory allocated for these structures. Does anyone have any suggestions how to make this work? (Do I perhaps need to allocate them using a special allocator so they don't get garbage collected?)
[PATCH] enable -Wformat-length for dynamically allocated buffers (pr 78245)
The -Wformat-length checker relies on the compute_builtin_object_size function to determine the size of the buffer it checks for overflow. The function returns either a size computed by the tree-object-size pass for objects referenced by the __builtin_object_size intrinsic (if it's used in the program) or it tries to compute it for a small subset of expressions otherwise. This subset doesn't include objects allocated by either malloc or alloca, and so for those the function returns "unknown" or (size_t)-1 in the case of -Wformat-length. As a consequence, -Wformat-length is unable to detect overflows involving such objects. The attached patch adds a new function, compute_object_size, that uses the existing algorithms to compute and return the sizes of allocated objects as well, as if they were referenced by __builtin_object_size in the program source, enabling the -Wformat-length checker to detect more buffer overflows. Martin PS The function makes use of the init_function_sizes API that is otherwise unused outside the tree-object-size pass to initialize the internal structures, but then calls fini_object_sizes to release them before returning. That seems wasteful because the size of the same object or one related to it might need to computed again in the context of the same function. I experimented with allocating and releasing the structures only when current_function_decl changes but that led to crashes. I suspect I'm missing something about the management of memory allocated for these structures. Does anyone have any suggestions how to make this work? (Do I perhaps need to allocate them using a special allocator so they don't get garbage collected?) PR middle-end/78245 - missing -Wformat-length on an overflow of a dynamically allocated buffer gcc/testsuite/ChangeLog: PR middle-end/78245 * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Add tests. gcc/ChangeLog: PR middle-end/78245 * gimple-ssa-sprintf.c (get_destination_size): Call compute_object_size. * tree-object-size.c (addr_object_size): Adjust. (pass_through_call): Adjust. (compute_object_size, internal_object_size): New functions. (compute_builtin_object_size): Call internal_object_size. (pass_object_sizes::execute): Adjust. * tree-object-size.h (compute_object_size): Declare. diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index 3138ad3..f360711 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -2471,7 +2471,7 @@ get_destination_size (tree dest) object (the function fails without optimization in this type). */ int ost = optimize > 0; unsigned HOST_WIDE_INT size; - if (compute_builtin_object_size (dest, ost, &size)) + if (compute_object_size (dest, ost, &size)) return size; return HOST_WIDE_INT_M1U; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c index 8d97fa8..9874332 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-3.c @@ -1,5 +1,10 @@ /* { dg-do compile } */ -/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */ +/* { dg-options "-O2 -Wformat -Wformat-length=1 -ftrack-macro-expansion=0" } */ +/* Verify that all sprintf built-ins detect overflow involving directives + with non-constant arguments known to be constrained by some range of + values, and even when writing into dynamically allocated buffers. + -O2 (-ftree-vrp) is necessary for the tests involving ranges to pass, + otherwise -O1 is sufficient. */ #ifndef LINE # define LINE 0 @@ -7,18 +12,26 @@ #define bos(x) __builtin_object_size (x, 0) -#define T(bufsize, fmt, ...) \ -do {\ - if (!LINE || __LINE__ == LINE) \ - {\ - char *d = (char *)__builtin_malloc (bufsize); \ - __builtin___sprintf_chk (d, 0, bos (d), fmt, __VA_ARGS__); \ - sink (d); \ - }\ -} while (0) +/* Defined (and redefined) to the allocation function to use, either + malloc, or alloca, or a VLA. */ +#define ALLOC(p, n) (p) = __builtin_malloc (n) -void -sink (void*); +/* Defined (and redefined) to the sprintf function to exercise. */ +#define TEST_SPRINTF(d, maxsize, objsize, fmt, ...) \ + __builtin___sprintf_chk (d, 0, objsize, fmt, __VA_ARGS__) + +#define T(bufsize, fmt, ...)\ + do { \ +if (!LINE || __LINE__ == LINE) \ + { \ + char *d; \ + ALLOC (d, bufsize);\ + TEST_SPRINTF (d, 0, bos (d), fmt, __VA_ARGS__); \ + sink (d); \ + } \ + } while (0) + +void sink (void*); /* Identity function to verify that the checker figures out the value of the operand even when it's not constant (i.e., makes use of @@ -232,3 +245,88 @@ void test_sprintf_chk_range_sshort (signed short *a, signed short *b) T ( 4, "%i", Ra (998, 999)); T ( 4, "%i", Ra (999, 1000)); /* { dg-warning "may write a terminating nul past the end of