On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <mse...@gmail.com> wrote:
>
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>
> Please let me know if there is something I need to change here
> to make the fix acceptable or if I should stop trying.

I have one more comment about

+  /* Defer warning (and folding) until the next statement in the basic
+     block is reachable.  */
+  if (!gimple_bb (stmt))
+    return false;
+

it's not about the next statement in the basic-block being "reachable"
(even w/o a CFG you can use gsi_next()) but rather that the next
stmt isn't yet gimplified and thus not inserted into the gimple sequence,
right?  You apply this to gimple_fold_builtin_strncpy but I'd rather
see us not sprinkling this over gimple-fold.c but instead do this
in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.

See the attached (untested).

Richard.



> On 10/31/2018 10:33 AM, Martin Sebor wrote:
> > Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
> >
> > On 10/20/2018 06:01 PM, Martin Sebor wrote:
> >> On 10/16/2018 03:21 PM, Jeff Law wrote:
> >>> On 10/4/18 9:51 AM, Martin Sebor wrote:
> >>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
> >>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
> >>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <l...@redhat.com> wrote:
> >>>>>>>
> >>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
> >>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <l...@redhat.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
> >>>>>>>>>> the next statement after a truncating strncpy to see if it
> >>>>>>>>>> adds a terminating nul.  This only works when the next
> >>>>>>>>>> statement can be reached using the Gimple statement iterator
> >>>>>>>>>> which isn't until after gimplification.  As a result, strncpy
> >>>>>>>>>> calls that truncate their constant argument that are being
> >>>>>>>>>> folded to memcpy this early get diagnosed even if they are
> >>>>>>>>>> followed by the nul assignment:
> >>>>>>>>>>
> >>>>>>>>>>   const char s[] = "12345";
> >>>>>>>>>>   char d[3];
> >>>>>>>>>>
> >>>>>>>>>>   void f (void)
> >>>>>>>>>>   {
> >>>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>>>>>>>>     d[sizeof d - 1] = 0;
> >>>>>>>>>>   }
> >>>>>>>>>>
> >>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
> >>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
> >>>>>>>>>> is in can be used to try to reach the next statement (this
> >>>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
> >>>>>>>>>> fold things early but in the case of strncpy (a relatively
> >>>>>>>>>> rarely used function that is often misused), getting
> >>>>>>>>>> the warning right while folding a bit later but still fairly
> >>>>>>>>>> early on seems like a reasonable compromise.  I fear that
> >>>>>>>>>> otherwise, the false positives will drive users to adopt
> >>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
> >>>>>>>>>> bugs cannot be as readily detected.
> >>>>>>>>>>
> >>>>>>>>>> Tested on x86_64-linux.
> >>>>>>>>>>
> >>>>>>>>>> Martin
> >>>>>>>>>>
> >>>>>>>>>> PS There still are outstanding cases where the warning can
> >>>>>>>>>> be avoided.  I xfailed them in the test for now but will
> >>>>>>>>>> still try to get them to work for GCC 9.
> >>>>>>>>>>
> >>>>>>>>>> gcc-87028.diff
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> >>>>>>>>>> strncpy with global variable source string
> >>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>
> >>>>>>>>>>       PR tree-optimization/87028
> >>>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
> >>>>>>>>>> folding when
> >>>>>>>>>>       statement doesn't belong to a basic block.
> >>>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
> >>>>>>>>>> MEM_REF on
> >>>>>>>>>>       the left hand side of assignment.
> >>>>>>>>>>
> >>>>>>>>>> gcc/testsuite/ChangeLog:
> >>>>>>>>>>
> >>>>>>>>>>       PR tree-optimization/87028
> >>>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>>>>>>>>> index 07341eb..284c2fb 100644
> >>>>>>>>>> --- a/gcc/gimple-fold.c
> >>>>>>>>>> +++ b/gcc/gimple-fold.c
> >>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> >>>>>>>>>> (gimple_stmt_iterator *gsi,
> >>>>>>>>>>    if (tree_int_cst_lt (ssize, len))
> >>>>>>>>>>      return false;
> >>>>>>>>>>
> >>>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
> >>>>>>>>>> basic
> >>>>>>>>>> +     block is reachable.  */
> >>>>>>>>>> +  if (!gimple_bb (stmt))
> >>>>>>>>>> +    return false;
> >>>>>>>>> I think you want cfun->cfg as the test here.  They should be
> >>>>>>>>> equivalent
> >>>>>>>>> in practice.
> >>>>>>>>
> >>>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
> >>>>>>>> also accessible
> >>>>>>>> when there is no CFG.  I guess the issue is that we fold this
> >>>>>>>> during
> >>>>>>>> gimplification where the next stmt is not yet "there" (but still in
> >>>>>>>> GENERIC)?
> >>>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
> >>>>>>> avoiding in that case.
> >>>>>>
> >>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
> >>>>> So I think the concern with adding the guards to maybe_fold_stmt is
> >>>>> the
> >>>>> possibility of further fallout.
> >>>>>
> >>>>> I guess they could be written to target this case specifically to
> >>>>> minimize fallout, but that feels like we're doing the same thing
> >>>>> (band-aid) just in a different place.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>>>
> >>>>>>>> We generally do not want to have unfolded stmts in the IL when we
> >>>>>>>> can avoid that
> >>>>>>>> which is why we fold most stmts during gimplification.  We also do
> >>>>>>>> that because
> >>>>>>>> we now do less folding on GENERIC.
> >>>>>>> But an unfolded call in the IL should always be safe and we've got
> >>>>>>> plenty of opportunities to fold it later.
> >>>>>>
> >>>>>> Well - we do.  The very first one is forwprop though which means
> >>>>>> we'll miss to
> >>>>>> re-write some memcpy parts into SSA:
> >>>>>>
> >>>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
> >>>>>>           /* After CCP we rewrite no longer addressed locals into SSA
> >>>>>>              form if possible.  */
> >>>>>>           NEXT_PASS (pass_forwprop);
> >>>>>>
> >>>>>> likewise early object-size will be confused by memcpy calls that just
> >>>>>> exist
> >>>>>> to avoid TBAA issues (another of our recommendations besides using
> >>>>>> unions).
> >>>>>>
> >>>>>> We do fold mem* early for a reason ;)
> >>>>>>
> >>>>>> "We can always do warnings earlier" would be a similar true sentence.
> >>>>> I'm not disagreeing at all.  There's a natural tension between the
> >>>>> benefits of folding early to enable more optimizations downstream and
> >>>>> leaving the IL in a state where we can give actionable warnings.
> >>>>
> >>>> Similar trade-offs between folding early and losing information
> >>>> as a result also impact high-level optimizations.
> >>>>
> >>>> For instance, folding the strlen argument below
> >>>>
> >>>>   void f3 (struct A* p)
> >>>>   {
> >>>>     __builtin_strcpy (p->a, "123");
> >>>>
> >>>>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
> >>>>       __builtin_abort ();
> >>>>   }
> >>>>
> >>>> into
> >>>>
> >>>>   _2 = &MEM[(void *)p_4(D) + 2B];
> >>>>
> >>>> early on defeats the strlen optimization because there is no
> >>>> mechanism to determine what member (void *)p_4(D) + 2B refers
> >>>> to (this is bug 86955).
> >>>>
> >>>> Another example is folding of strlen calls with no-nconstant
> >>>> offsets into constant strings like here:
> >>>>
> >>>>   const char a[] = "123";
> >>>>
> >>>>   void f (int i)
> >>>>   {
> >>>>     if (__builtin_strlen (&a[i]) > 3)
> >>>>       __builtin_abort ();
> >>>>   }
> >>>>
> >>>> into sizeof a - 1 - i, which then prevents the result from
> >>>> being folded to false  (bug 86434), not to mention the code
> >>>> it emits for out-of-bounds indices.
> >>>>
> >>>> There are a number of other similar examples in Bugzilla
> >>>> that I've filed as I discovered then during testing my
> >>>> warnings (e.g., 86572).
> >>>>
> >>>> In my mind, transforming library calls into "lossy" low-level
> >>>> primitives like MEM_REF would be better done only after higher
> >>>> level optimizations have had a chance to analyze them.  Ditto
> >>>> for other similar transformations (like to other library calls).
> >>>> Having more accurate information helps both optimization and
> >>>> warnings.  It also makes the warnings more meaningful.
> >>>> Printing "memcpy overflows a buffer" when the source code
> >>>> has a call to strncpy is less than ideal.
> >>>>
> >>>>> Similarly there's a natural tension between warning early vs warning
> >>>>> late.  Code that triggers the warning may ultimately be proved
> >>>>> unreachable, or we may discover simplifications that either
> >>>>> suppress or
> >>>>> expose a warning.
> >>>>>
> >>>>> There is no easy answer here.  But I think we can legitimately ask
> >>>>> questions.  ie, does folding strnlen here really improve things
> >>>>> downstream in ways that are measurable?  Does the false positive
> >>>>> really
> >>>>> impact the utility of the warning?  etc.
> >>>>>
> >>>>> I'd hazard a guess that Martin is particularly sensitive to false
> >>>>> positives based on feedback he's received from our developer community
> >>>>> as well as downstream consumers of his work.
> >>>>
> >>>> Yes.  The kernel folks in particular have done a lot of work
> >>>> cleaning up their code in an effort to adopt the warning and
> >>>> attribute nonstring.  They have been keeping me in the loop
> >>>> on their progress (and feeding me back test cases with false
> >>>> positives and negatives they run into).
> >>> I can't recall seeing further guidance from Richi WRT putting the checks
> >>> earlier (maybe_fold_stmt).
> >>>
> >>> If the point here is to avoid false positives by not folding strncpy,
> >>> particularly in cases where we don't see the NUL in the copy, but it
> >>> appears in a subsequent store, then let's be fairly selective (so as not
> >>> to muck up things on the optimization side more than is necessary).
> >>>
> >>> ISTM we can do this by refactoring the warning bits so they're reusable
> >>> at different points in the pipeline.  Those bits would always return a
> >>> boolean indicating if the given statement might generate a warning or
> >>> not.
> >>>
> >>> When called early, they would not actually issue any warning.  They
> >>> would merely do the best analysis they can and return a status
> >>> indicating whether or not the statement would generate a warning given
> >>> current context.  The goal here is to leave statements that might
> >>> generate a warning as-is in the IL.
> >>>
> >>> When called late (assuming there is a point where we can walk the IL and
> >>> issue the appropriate warnings), the routine would actually issue the
> >>> warning.
> >>>
> >>> The kind of structure could potentially work for other builtins where we
> >>> may need to look at subsequent statements to avoid false positives, but
> >>> early folding hides cases by transforming the call into an undesirable
> >>> form.
> >>>
> >>> Note that for cases where a call looks problematical early because we
> >>> can't see statement which stores the terminator, but where the
> >>> terminator statement ultimately becomes visible, we still get folding,
> >>> it just happens later in the pipeline.
> >>>
> >>> Thoughts?
> >>
> >> The warning only triggers when the bound is less than or equal
> >> to the length of the constant source string (i.e, when strncpy
> >> truncates).  So IIUC, your suggestion would defer folding only
> >> such strncpy calls and let gimple_fold_builtin_strncpy fold
> >> those with a constant bound that's greater than the length of
> >> the constant source string.  That would be fine with me, but
> >> since strncpy calls with a bound that's greater than the length
> >> of the source are pointless I don't think they are important
> >> enough to worry about folding super early.  The constant ones
> >> that serve any purpose (and that are presumably important to
> >> optimize) are those that truncate.
> >>
> >> That said, when optimization isn't enabled, I don't think users
> >> expect calls to library functions to be transformed to calls to
> >> other  functions, or inlined.  Yet that's just what GCC does.
> >> For example, besides triggering the warning, the following:
> >>
> >>   char a[4];
> >>
> >>   void f (char *s)
> >>   {
> >>     __builtin_strncpy (a, "1234", sizeof a);
> >>     a[3] = 0;
> >>   }
> >>
> >> is transformed, even at -O0, into:
> >>
> >>   f (char * s)
> >>   {
> >>     <bb 2> :
> >>     MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
> >>     a[3] = 0;
> >>     return;
> >>   }
> >>
> >> That doesn't seem right.  GCC should avoid these transformations
> >> at -O0, and one way to do that is to defer folding until the CFG
> >> is constructed.  The patch does it for strncpy but a more general
> >> solution would do that for all calls, e.g., in maybe_fold_stmt
> >> as Richard suggested (and I subsequently tested).
> >>
> >> Martin
> >
>

Attachment: p
Description: Binary data

Reply via email to