On Thu, Nov 29, 2018 at 9:34 PM Martin Sebor <mse...@gmail.com> wrote: > > On 11/16/2018 02:07 AM, Richard Biener wrote: > > On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <mse...@gmail.com> wrote: > >> > >> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html > >> > >> Please let me know if there is something I need to change here > >> to make the fix acceptable or if I should stop trying. > > > > I have one more comment about > > > > + /* Defer warning (and folding) until the next statement in the basic > > + block is reachable. */ > > + if (!gimple_bb (stmt)) > > + return false; > > + > > > > it's not about the next statement in the basic-block being "reachable" > > (even w/o a CFG you can use gsi_next()) but rather that the next > > stmt isn't yet gimplified and thus not inserted into the gimple sequence, > > right? > > No, it's about the current statement not being associated with > a basic block yet when the warning code runs for the first time > (during gimplify_expr), and so gsi_next() returning null. > > > You apply this to gimple_fold_builtin_strncpy but I'd rather > > see us not sprinkling this over gimple-fold.c but instead do this > > in gimplify.c:maybe_fold_stmt, delaying folding until say lowering. > > > > See the attached (untested). > > I would also prefer this solution. I had tested it (in response > to you first mentioning it back in September) and it causes quite > a bit of fallout in tests that look for the folding to take place > very early. See the end of my reply here: > > https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html > > But I'm willing to do the test suite cleanup if you think it's > suitable for GCC 9. (If you're thinking GCC 10 please let me > know now.)
I very much prefer that to the hacks in gimple-fold.c if it doesn't help now then I'll rather live with some bogus warnings for GCC 9 and fix it up properly for GCC 10. I expect the fallout to be quite minimal (also considering my suggestion to do the folding in gimple-low.c). Richard. > Thanks > Martin > > > > > Richard. > > > > > > > >> On 10/31/2018 10:33 AM, Martin Sebor wrote: > >>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html > >>> > >>> On 10/20/2018 06:01 PM, Martin Sebor wrote: > >>>> On 10/16/2018 03:21 PM, Jeff Law wrote: > >>>>> On 10/4/18 9:51 AM, Martin Sebor wrote: > >>>>>> On 10/04/2018 08:58 AM, Jeff Law wrote: > >>>>>>> On 8/27/18 9:42 AM, Richard Biener wrote: > >>>>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <l...@redhat.com> wrote: > >>>>>>>>> > >>>>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote: > >>>>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <l...@redhat.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote: > >>>>>>>>>>>> The warning suppression for -Wstringop-truncation looks for > >>>>>>>>>>>> the next statement after a truncating strncpy to see if it > >>>>>>>>>>>> adds a terminating nul. This only works when the next > >>>>>>>>>>>> statement can be reached using the Gimple statement iterator > >>>>>>>>>>>> which isn't until after gimplification. As a result, strncpy > >>>>>>>>>>>> calls that truncate their constant argument that are being > >>>>>>>>>>>> folded to memcpy this early get diagnosed even if they are > >>>>>>>>>>>> followed by the nul assignment: > >>>>>>>>>>>> > >>>>>>>>>>>> const char s[] = "12345"; > >>>>>>>>>>>> char d[3]; > >>>>>>>>>>>> > >>>>>>>>>>>> void f (void) > >>>>>>>>>>>> { > >>>>>>>>>>>> strncpy (d, s, sizeof d - 1); // -Wstringop-truncation > >>>>>>>>>>>> d[sizeof d - 1] = 0; > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> To avoid the warning I propose to defer folding strncpy to > >>>>>>>>>>>> memcpy until the pointer to the basic block the strnpy call > >>>>>>>>>>>> is in can be used to try to reach the next statement (this > >>>>>>>>>>>> happens as early as ccp1). I'm aware of the preference to > >>>>>>>>>>>> fold things early but in the case of strncpy (a relatively > >>>>>>>>>>>> rarely used function that is often misused), getting > >>>>>>>>>>>> the warning right while folding a bit later but still fairly > >>>>>>>>>>>> early on seems like a reasonable compromise. I fear that > >>>>>>>>>>>> otherwise, the false positives will drive users to adopt > >>>>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of > >>>>>>>>>>>> bugs cannot be as readily detected. > >>>>>>>>>>>> > >>>>>>>>>>>> Tested on x86_64-linux. > >>>>>>>>>>>> > >>>>>>>>>>>> Martin > >>>>>>>>>>>> > >>>>>>>>>>>> PS There still are outstanding cases where the warning can > >>>>>>>>>>>> be avoided. I xfailed them in the test for now but will > >>>>>>>>>>>> still try to get them to work for GCC 9. > >>>>>>>>>>>> > >>>>>>>>>>>> gcc-87028.diff > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation > >>>>>>>>>>>> strncpy with global variable source string > >>>>>>>>>>>> gcc/ChangeLog: > >>>>>>>>>>>> > >>>>>>>>>>>> PR tree-optimization/87028 > >>>>>>>>>>>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid > >>>>>>>>>>>> folding when > >>>>>>>>>>>> statement doesn't belong to a basic block. > >>>>>>>>>>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle > >>>>>>>>>>>> MEM_REF on > >>>>>>>>>>>> the left hand side of assignment. > >>>>>>>>>>>> > >>>>>>>>>>>> gcc/testsuite/ChangeLog: > >>>>>>>>>>>> > >>>>>>>>>>>> PR tree-optimization/87028 > >>>>>>>>>>>> * c-c++-common/Wstringop-truncation.c: Remove xfails. > >>>>>>>>>>>> * gcc.dg/Wstringop-truncation-5.c: New test. > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > >>>>>>>>>>>> index 07341eb..284c2fb 100644 > >>>>>>>>>>>> --- a/gcc/gimple-fold.c > >>>>>>>>>>>> +++ b/gcc/gimple-fold.c > >>>>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy > >>>>>>>>>>>> (gimple_stmt_iterator *gsi, > >>>>>>>>>>>> if (tree_int_cst_lt (ssize, len)) > >>>>>>>>>>>> return false; > >>>>>>>>>>>> > >>>>>>>>>>>> + /* Defer warning (and folding) until the next statement in the > >>>>>>>>>>>> basic > >>>>>>>>>>>> + block is reachable. */ > >>>>>>>>>>>> + if (!gimple_bb (stmt)) > >>>>>>>>>>>> + return false; > >>>>>>>>>>> I think you want cfun->cfg as the test here. They should be > >>>>>>>>>>> equivalent > >>>>>>>>>>> in practice. > >>>>>>>>>> > >>>>>>>>>> Please do not add 'cfun' references. Note that the next stmt is > >>>>>>>>>> also accessible > >>>>>>>>>> when there is no CFG. I guess the issue is that we fold this > >>>>>>>>>> during > >>>>>>>>>> gimplification where the next stmt is not yet "there" (but still in > >>>>>>>>>> GENERIC)? > >>>>>>>>> That was my assumption. I almost suggested peeking at gsi_next and > >>>>>>>>> avoiding in that case. > >>>>>>>> > >>>>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then. > >>>>>>> So I think the concern with adding the guards to maybe_fold_stmt is > >>>>>>> the > >>>>>>> possibility of further fallout. > >>>>>>> > >>>>>>> I guess they could be written to target this case specifically to > >>>>>>> minimize fallout, but that feels like we're doing the same thing > >>>>>>> (band-aid) just in a different place. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> We generally do not want to have unfolded stmts in the IL when we > >>>>>>>>>> can avoid that > >>>>>>>>>> which is why we fold most stmts during gimplification. We also do > >>>>>>>>>> that because > >>>>>>>>>> we now do less folding on GENERIC. > >>>>>>>>> But an unfolded call in the IL should always be safe and we've got > >>>>>>>>> plenty of opportunities to fold it later. > >>>>>>>> > >>>>>>>> Well - we do. The very first one is forwprop though which means > >>>>>>>> we'll miss to > >>>>>>>> re-write some memcpy parts into SSA: > >>>>>>>> > >>>>>>>> NEXT_PASS (pass_ccp, false /* nonzero_p */); > >>>>>>>> /* After CCP we rewrite no longer addressed locals into SSA > >>>>>>>> form if possible. */ > >>>>>>>> NEXT_PASS (pass_forwprop); > >>>>>>>> > >>>>>>>> likewise early object-size will be confused by memcpy calls that just > >>>>>>>> exist > >>>>>>>> to avoid TBAA issues (another of our recommendations besides using > >>>>>>>> unions). > >>>>>>>> > >>>>>>>> We do fold mem* early for a reason ;) > >>>>>>>> > >>>>>>>> "We can always do warnings earlier" would be a similar true sentence. > >>>>>>> I'm not disagreeing at all. There's a natural tension between the > >>>>>>> benefits of folding early to enable more optimizations downstream and > >>>>>>> leaving the IL in a state where we can give actionable warnings. > >>>>>> > >>>>>> Similar trade-offs between folding early and losing information > >>>>>> as a result also impact high-level optimizations. > >>>>>> > >>>>>> For instance, folding the strlen argument below > >>>>>> > >>>>>> void f3 (struct A* p) > >>>>>> { > >>>>>> __builtin_strcpy (p->a, "123"); > >>>>>> > >>>>>> if (__builtin_strlen (p->a + 1) != 2) // not folded > >>>>>> __builtin_abort (); > >>>>>> } > >>>>>> > >>>>>> into > >>>>>> > >>>>>> _2 = &MEM[(void *)p_4(D) + 2B]; > >>>>>> > >>>>>> early on defeats the strlen optimization because there is no > >>>>>> mechanism to determine what member (void *)p_4(D) + 2B refers > >>>>>> to (this is bug 86955). > >>>>>> > >>>>>> Another example is folding of strlen calls with no-nconstant > >>>>>> offsets into constant strings like here: > >>>>>> > >>>>>> const char a[] = "123"; > >>>>>> > >>>>>> void f (int i) > >>>>>> { > >>>>>> if (__builtin_strlen (&a[i]) > 3) > >>>>>> __builtin_abort (); > >>>>>> } > >>>>>> > >>>>>> into sizeof a - 1 - i, which then prevents the result from > >>>>>> being folded to false (bug 86434), not to mention the code > >>>>>> it emits for out-of-bounds indices. > >>>>>> > >>>>>> There are a number of other similar examples in Bugzilla > >>>>>> that I've filed as I discovered then during testing my > >>>>>> warnings (e.g., 86572). > >>>>>> > >>>>>> In my mind, transforming library calls into "lossy" low-level > >>>>>> primitives like MEM_REF would be better done only after higher > >>>>>> level optimizations have had a chance to analyze them. Ditto > >>>>>> for other similar transformations (like to other library calls). > >>>>>> Having more accurate information helps both optimization and > >>>>>> warnings. It also makes the warnings more meaningful. > >>>>>> Printing "memcpy overflows a buffer" when the source code > >>>>>> has a call to strncpy is less than ideal. > >>>>>> > >>>>>>> Similarly there's a natural tension between warning early vs warning > >>>>>>> late. Code that triggers the warning may ultimately be proved > >>>>>>> unreachable, or we may discover simplifications that either > >>>>>>> suppress or > >>>>>>> expose a warning. > >>>>>>> > >>>>>>> There is no easy answer here. But I think we can legitimately ask > >>>>>>> questions. ie, does folding strnlen here really improve things > >>>>>>> downstream in ways that are measurable? Does the false positive > >>>>>>> really > >>>>>>> impact the utility of the warning? etc. > >>>>>>> > >>>>>>> I'd hazard a guess that Martin is particularly sensitive to false > >>>>>>> positives based on feedback he's received from our developer community > >>>>>>> as well as downstream consumers of his work. > >>>>>> > >>>>>> Yes. The kernel folks in particular have done a lot of work > >>>>>> cleaning up their code in an effort to adopt the warning and > >>>>>> attribute nonstring. They have been keeping me in the loop > >>>>>> on their progress (and feeding me back test cases with false > >>>>>> positives and negatives they run into). > >>>>> I can't recall seeing further guidance from Richi WRT putting the checks > >>>>> earlier (maybe_fold_stmt). > >>>>> > >>>>> If the point here is to avoid false positives by not folding strncpy, > >>>>> particularly in cases where we don't see the NUL in the copy, but it > >>>>> appears in a subsequent store, then let's be fairly selective (so as not > >>>>> to muck up things on the optimization side more than is necessary). > >>>>> > >>>>> ISTM we can do this by refactoring the warning bits so they're reusable > >>>>> at different points in the pipeline. Those bits would always return a > >>>>> boolean indicating if the given statement might generate a warning or > >>>>> not. > >>>>> > >>>>> When called early, they would not actually issue any warning. They > >>>>> would merely do the best analysis they can and return a status > >>>>> indicating whether or not the statement would generate a warning given > >>>>> current context. The goal here is to leave statements that might > >>>>> generate a warning as-is in the IL. > >>>>> > >>>>> When called late (assuming there is a point where we can walk the IL and > >>>>> issue the appropriate warnings), the routine would actually issue the > >>>>> warning. > >>>>> > >>>>> The kind of structure could potentially work for other builtins where we > >>>>> may need to look at subsequent statements to avoid false positives, but > >>>>> early folding hides cases by transforming the call into an undesirable > >>>>> form. > >>>>> > >>>>> Note that for cases where a call looks problematical early because we > >>>>> can't see statement which stores the terminator, but where the > >>>>> terminator statement ultimately becomes visible, we still get folding, > >>>>> it just happens later in the pipeline. > >>>>> > >>>>> Thoughts? > >>>> > >>>> The warning only triggers when the bound is less than or equal > >>>> to the length of the constant source string (i.e, when strncpy > >>>> truncates). So IIUC, your suggestion would defer folding only > >>>> such strncpy calls and let gimple_fold_builtin_strncpy fold > >>>> those with a constant bound that's greater than the length of > >>>> the constant source string. That would be fine with me, but > >>>> since strncpy calls with a bound that's greater than the length > >>>> of the source are pointless I don't think they are important > >>>> enough to worry about folding super early. The constant ones > >>>> that serve any purpose (and that are presumably important to > >>>> optimize) are those that truncate. > >>>> > >>>> That said, when optimization isn't enabled, I don't think users > >>>> expect calls to library functions to be transformed to calls to > >>>> other functions, or inlined. Yet that's just what GCC does. > >>>> For example, besides triggering the warning, the following: > >>>> > >>>> char a[4]; > >>>> > >>>> void f (char *s) > >>>> { > >>>> __builtin_strncpy (a, "1234", sizeof a); > >>>> a[3] = 0; > >>>> } > >>>> > >>>> is transformed, even at -O0, into: > >>>> > >>>> f (char * s) > >>>> { > >>>> <bb 2> : > >>>> MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"]; > >>>> a[3] = 0; > >>>> return; > >>>> } > >>>> > >>>> That doesn't seem right. GCC should avoid these transformations > >>>> at -O0, and one way to do that is to defer folding until the CFG > >>>> is constructed. The patch does it for strncpy but a more general > >>>> solution would do that for all calls, e.g., in maybe_fold_stmt > >>>> as Richard suggested (and I subsequently tested). > >>>> > >>>> Martin > >>> > >> >