On Thu, 2020-02-06 at 14:16 +0100, Richard Biener wrote:
> On Thu, Feb 6, 2020 at 2:00 PM Jeff Law <l...@redhat.com> wrote:
> > On Wed, 2020-02-05 at 09:19 +0100, Richard Biener wrote:
> > > On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor <mse...@gmail.com> wrote:
> > > > On 2/4/20 2:31 PM, Jeff Law wrote:
> > > > > On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> > > > > > On 2/4/20 12:15 PM, Richard Biener wrote:
> > > > > > > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law 
> > > > > > > <l...@redhat.com> wrote:
> > > > > > > > On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > > > > > > > > On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor 
> > > > > > > > > <mse...@gmail.com> wrote:
> > > > > > > > > > PR 93519 reports a false positive -Wrestrict issued for an 
> > > > > > > > > > inlined
> > > > > > > > call
> > > > > > > > > > to strcpy that carefully guards against self-copying.  This 
> > > > > > > > > > is
> > > > > > > > caused
> > > > > > > > > > by the caller's arguments substituted into the call during 
> > > > > > > > > > inlining
> > > > > > > > and
> > > > > > > > > > before dead code elimination.
> > > > > > > > > > 
> > > > > > > > > > The attached patch avoids this by removing -Wrestrict from 
> > > > > > > > > > the
> > > > > > > > folder
> > > > > > > > > > and deferring folding perfectly overlapping (and so 
> > > > > > > > > > undefined)
> > > > > > > > calls
> > > > > > > > > > to strcpy (and mempcpy, but not memcpy) until much later.  
> > > > > > > > > > Calls to
> > > > > > > > > > perfectly overlapping calls to memcpy are still folded 
> > > > > > > > > > early.
> > > > > > > > > 
> > > > > > > > > Why do we bother to warn at all for this case?  Just DWIM 
> > > > > > > > > here.
> > > > > > > > Warnings like
> > > > > > > > > this can be emitted from the analyzer?
> > > > > > > > They potentially can, but the analyzer is and will almost always
> > > > > > > > certainly be considerably slower.  I would not expect it to be 
> > > > > > > > used
> > > > > > > > nearly as much as the core compiler.
> > > > > > > > 
> > > > > > > > WHether or not a particular warning makes sense in the core 
> > > > > > > > compiler or
> > > > > > > > analyzer would seem to me to depend on whether or not we can 
> > > > > > > > reasonably
> > > > > > > > issue warnings without interprocedural analysis.  double-free
> > > > > > > > realistically requires interprocedural analysis to be 
> > > > > > > > effective.  I'm
> > > > > > > > not sure Wrestrict really does.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > That is, I suggest to simply remove the bogus warning code 
> > > > > > > > > from
> > > > > > > > folding
> > > > > > > > > (and _not_ fail the folding).
> > > > > > > > I haven't looked at the patch, but if we can get the warning 
> > > > > > > > out of the
> > > > > > > > folder that's certainly preferable.  And we could investigate 
> > > > > > > > deferring
> > > > > > > > self-copy removal.
> > > > > > > 
> > > > > > > I think the issue is as usual, warning for code we'll later 
> > > > > > > remove as dead. Warning at folding is almost always premature.
> > > > > > 
> > > > > > In this instance the code is reachable (or isn't obviously 
> > > > > > unreachable).
> > > > > > GCC doesn't remove it, but provides benign (and reasonable) 
> > > > > > semantics
> > > > > > for it(*).  To me, that's one aspect of quality.  Letting the user 
> > > > > > know
> > > > > > that the code is buggy is another.  I view that as at least as 
> > > > > > important
> > > > > > as folding the ill-effects away because it makes it possible to fix
> > > > > > the problem so the code works correctly even with compilers that 
> > > > > > don't
> > > > > > provide these benign semantics.
> > > > > If you look at the guts of what happens at the point where we issue 
> > > > > the
> > > > > warning from within gimple_fold_builtin_strcpy we have:
> > > > > 
> > > > > > DCH_to_char (char * in, char * out, int collid)
> > > > > > {
> > > > > >    int type;
> > > > > >    char * D.2148;
> > > > > >    char * dest;
> > > > > >    char * num;
> > > > > >    long unsigned int _4;
> > > > > >    char * _5;
> > > > > > 
> > > > > > ;;   basic block 2, loop depth 0
> > > > > > ;;    pred:       ENTRY
> > > > > > ;;    succ:       4
> > > > > > 
> > > > > > ;;   basic block 4, loop depth 0
> > > > > > ;;    pred:       2
> > > > > > ;;    succ:       5
> > > > > > 
> > > > > > ;;   basic block 5, loop depth 0
> > > > > > ;;    pred:       4
> > > > > > ;;    succ:       6
> > > > > > 
> > > > > > ;;   basic block 6, loop depth 0
> > > > > > ;;    pred:       5
> > > > > >    if (0 != 0)
> > > > > >      goto <bb 7>; [53.47%]
> > > > > >    else
> > > > > >      goto <bb 8>; [46.53%]
> > > > > > ;;    succ:       7
> > > > > > ;;                8
> > > > > > 
> > > > > > ;;   basic block 7, loop depth 0
> > > > > > ;;    pred:       6
> > > > > >    strcpy (out_1(D), out_1(D));
> > > > > > ;;    succ:       8
> > > > > > 
> > > > > > ;;   basic block 8, loop depth 0
> > > > > > ;;    pred:       6
> > > > > > ;;                7
> > > > > >    _4 = __builtin_strlen (out_1(D));
> > > > > >    _5 = out_1(D) + _4;
> > > > > >    __builtin_memcpy (_5, "foo", 4);
> > > > > > ;;    succ:       3
> > > > > > 
> > > > > > ;;   basic block 3, loop depth 0
> > > > > > ;;    pred:       8
> > > > > >    return;
> > > > > > ;;    succ:       EXIT
> > > > > > 
> > > > > > }
> > > > > > 
> > > > > 
> > > > > Which shows the code is obviously unreachable in the case we're 
> > > > > warning
> > > > > about.  You can't see this in the dumps because it's exposed by
> > > > > inlining, then cleaned up before writing the dump file.
> > > > 
> > > > In the specific case of the bug the code is of course eliminated
> > > > because it's guarded by the if (s != d).  I was referring to
> > > > the general (unguarded) case of:
> > > > 
> > > >    char *s = "", *p;
> > > > 
> > > >    int main (void)
> > > >    {
> > > >      p = strcpy (s, s);
> > > >      puts (p);
> > > >    }
> > > > 
> > > > where GCC folds the assignment 'p = strcpy(s, s);' to effectively
> > > > p = s;  That's perfectly reasonable but it could equally as well
> > > > leave the call alone, as it does when s is null, for instance.
> > > > 
> > > > I think folding it away is not only reasonable but preferable to
> > > > making the invalid call, but it's done only rarely.  Most of
> > > > the time GCC does emit the undefined access (it does that with
> > > > calls to library functions as well as with direct stores and
> > > > reads).  (I am hoping we can change that in the future so that
> > > > these kinds of problems are handled with some consistency.)
> > > > 
> > > > > ISTM this would be a case we could handle with the __builtin_warning
> > > > > stuff.
> > > > > 
> > > > > I think the question is do we want to do anything about it this cycle?
> > > > > 
> > > > > 
> > > > > If so, I think Martin's approach is quite reasonable.  It disables
> > > > > folding away the self-copies from gimple-fold and moves the warning
> > > > > into the expander.  So if there's such a call in the IL at expansion
> > > > > time we get a warning (-O0).
> > > > > 
> > > > > I'd hazard a guess that the diagnostic was added to the strlen pass to
> > > > > capture the missed warning when we're optimizing and the self-copy has
> > > > > survived until that point. There's a couple issues that raises though.
> > > > > 
> > > > > First, it's insufficient.  DSE (for example) can do self-copy removal,
> > > > > so it needs the same handling.  There may be others places too.
> > > > > 
> > > > > Second, if the code becomes unreachable after strlen, then we've got
> > > > > new false positive issues.
> > > > > 
> > > > > It's the classic problems we have with all middle end based warnings.
> > > > > 
> > > > > But I could live with those if we can show that using 
> > > > > __builtin_warning
> > > > > to handle this stuff in gcc-11 works...  ISTM we emit the
> > > > > __builtin_warning call before any self-copy like that, whenever we
> > > > > happen to spot them.  They'll naturally get removed if the path 
> > > > > becomes
> > > > > unreachable.  We'd warn during expansion for calls to
> > > > > __builtin_warning.  We could even optionally warn when removing a call
> > > > > to __bulitin_warning.
> > > > > 
> > > > > Thoughts?
> > > > 
> > > > The patch has pretty much the same effect as emitting __builtin_warning
> > > > from the folder would.  It defers the folding until much later, and if
> > > > the code isn't eliminated, it issues a warning and folds the call away.
> > > 
> > > But it affects subsequent optimizations - the call is more expensive
> > > in any size heuristics, it posses an (alias-set zero) memory write
> > > barrier (unless you start to optimize no-op copies in the alias oracle),
> > > it is a _call_ - passes like the vectorizer are not happy about a call.
> > > It prevents SRA of the accessed object, ...
> > Yes, it does affect subsequent optimizations, but realistically how
> > often do we have self-assignments?  And how often to we have them via
> > the str* functions.
> > 
> > Additionally these are undefined except for memmove.  I find it hard to
> > believe these are appearing it performance sensitive code.
> > 
> > So I'd claim it's equivalent from a practical standpoint.
> 
> But if the situation is so rare why does emitting the diagnostic matter 
> then...
Vulnerabilities are often on uncommon paths.

In the case of a self-assignment I'm not terribly worried because it's
a nop and the potential for exploitation is minimal -- so eliminating
it at a reasonable point and potentially losing the diagnostic seems
reasonable.

A non-perfect overlap is much more of a concern and definitely worth a
diagnostic IMHO.

> 
> Also it's not only optimization but eventually further diagnostic that only
> gets uncovered by optimizing such self-assignments.
Absolutely.  As I' fond of saying there's a natural tension here. 
There are no easy answers and most of the time the choices involve
trading off one against another.

> 
> > I guess I could do a Fedora build with a diagnostic to tell us how
> > often this happens.  Would you be willing to reconsider if the data
> > shows this just doesn't happen often enough to matter?
> 
> From my side its more on principle.
> 
> But see my two proposed patches to address the actual testcase
> (second one doing a PRE walk for fold_marked_stmts is the one I prefer).
They crossed in the ether.  Outgoing messages on my laptop had been
queued up due to a network issue.

jeff

Reply via email to