Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
On 10/20/2018 06:01 PM, Martin Sebor wrote:
On 10/16/2018 03:21 PM, Jeff Law wrote:
On 10/4/18 9:51 AM, Martin Sebor wrote:
On 10/04/2018 08:58 AM, Jeff Law wrote:
On 8/27/18 9:42 AM, Richard Biener wrote:
On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <l...@redhat.com> wrote:
On 08/27/2018 02:29 AM, Richard Biener wrote:
On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <l...@redhat.com> wrote:
On 08/24/2018 09:58 AM, Martin Sebor wrote:
The warning suppression for -Wstringop-truncation looks for
the next statement after a truncating strncpy to see if it
adds a terminating nul. This only works when the next
statement can be reached using the Gimple statement iterator
which isn't until after gimplification. As a result, strncpy
calls that truncate their constant argument that are being
folded to memcpy this early get diagnosed even if they are
followed by the nul assignment:
const char s[] = "12345";
char d[3];
void f (void)
{
strncpy (d, s, sizeof d - 1); // -Wstringop-truncation
d[sizeof d - 1] = 0;
}
To avoid the warning I propose to defer folding strncpy to
memcpy until the pointer to the basic block the strnpy call
is in can be used to try to reach the next statement (this
happens as early as ccp1). I'm aware of the preference to
fold things early but in the case of strncpy (a relatively
rarely used function that is often misused), getting
the warning right while folding a bit later but still fairly
early on seems like a reasonable compromise. I fear that
otherwise, the false positives will drive users to adopt
other unsafe solutions (like memcpy) where these kinds of
bugs cannot be as readily detected.
Tested on x86_64-linux.
Martin
PS There still are outstanding cases where the warning can
be avoided. I xfailed them in the test for now but will
still try to get them to work for GCC 9.
gcc-87028.diff
PR tree-optimization/87028 - false positive -Wstringop-truncation
strncpy with global variable source string
gcc/ChangeLog:
PR tree-optimization/87028
* gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
folding when
statement doesn't belong to a basic block.
* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
MEM_REF on
the left hand side of assignment.
gcc/testsuite/ChangeLog:
PR tree-optimization/87028
* c-c++-common/Wstringop-truncation.c: Remove xfails.
* gcc.dg/Wstringop-truncation-5.c: New test.
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 07341eb..284c2fb 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
(gimple_stmt_iterator *gsi,
if (tree_int_cst_lt (ssize, len))
return false;
+ /* Defer warning (and folding) until the next statement in the
basic
+ block is reachable. */
+ if (!gimple_bb (stmt))
+ return false;
I think you want cfun->cfg as the test here. They should be
equivalent
in practice.
Please do not add 'cfun' references. Note that the next stmt is
also accessible
when there is no CFG. I guess the issue is that we fold this during
gimplification where the next stmt is not yet "there" (but still in
GENERIC)?
That was my assumption. I almost suggested peeking at gsi_next and
avoiding in that case.
So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
So I think the concern with adding the guards to maybe_fold_stmt is the
possibility of further fallout.
I guess they could be written to target this case specifically to
minimize fallout, but that feels like we're doing the same thing
(band-aid) just in a different place.
We generally do not want to have unfolded stmts in the IL when we
can avoid that
which is why we fold most stmts during gimplification. We also do
that because
we now do less folding on GENERIC.
But an unfolded call in the IL should always be safe and we've got
plenty of opportunities to fold it later.
Well - we do. The very first one is forwprop though which means
we'll miss to
re-write some memcpy parts into SSA:
NEXT_PASS (pass_ccp, false /* nonzero_p */);
/* After CCP we rewrite no longer addressed locals into SSA
form if possible. */
NEXT_PASS (pass_forwprop);
likewise early object-size will be confused by memcpy calls that just
exist
to avoid TBAA issues (another of our recommendations besides using
unions).
We do fold mem* early for a reason ;)
"We can always do warnings earlier" would be a similar true sentence.
I'm not disagreeing at all. There's a natural tension between the
benefits of folding early to enable more optimizations downstream and
leaving the IL in a state where we can give actionable warnings.
Similar trade-offs between folding early and losing information
as a result also impact high-level optimizations.
For instance, folding the strlen argument below
void f3 (struct A* p)
{
__builtin_strcpy (p->a, "123");
if (__builtin_strlen (p->a + 1) != 2) // not folded
__builtin_abort ();
}
into
_2 = &MEM[(void *)p_4(D) + 2B];
early on defeats the strlen optimization because there is no
mechanism to determine what member (void *)p_4(D) + 2B refers
to (this is bug 86955).
Another example is folding of strlen calls with no-nconstant
offsets into constant strings like here:
const char a[] = "123";
void f (int i)
{
if (__builtin_strlen (&a[i]) > 3)
__builtin_abort ();
}
into sizeof a - 1 - i, which then prevents the result from
being folded to false (bug 86434), not to mention the code
it emits for out-of-bounds indices.
There are a number of other similar examples in Bugzilla
that I've filed as I discovered then during testing my
warnings (e.g., 86572).
In my mind, transforming library calls into "lossy" low-level
primitives like MEM_REF would be better done only after higher
level optimizations have had a chance to analyze them. Ditto
for other similar transformations (like to other library calls).
Having more accurate information helps both optimization and
warnings. It also makes the warnings more meaningful.
Printing "memcpy overflows a buffer" when the source code
has a call to strncpy is less than ideal.
Similarly there's a natural tension between warning early vs warning
late. Code that triggers the warning may ultimately be proved
unreachable, or we may discover simplifications that either suppress or
expose a warning.
There is no easy answer here. But I think we can legitimately ask
questions. ie, does folding strnlen here really improve things
downstream in ways that are measurable? Does the false positive really
impact the utility of the warning? etc.
I'd hazard a guess that Martin is particularly sensitive to false
positives based on feedback he's received from our developer community
as well as downstream consumers of his work.
Yes. The kernel folks in particular have done a lot of work
cleaning up their code in an effort to adopt the warning and
attribute nonstring. They have been keeping me in the loop
on their progress (and feeding me back test cases with false
positives and negatives they run into).
I can't recall seeing further guidance from Richi WRT putting the checks
earlier (maybe_fold_stmt).
If the point here is to avoid false positives by not folding strncpy,
particularly in cases where we don't see the NUL in the copy, but it
appears in a subsequent store, then let's be fairly selective (so as not
to muck up things on the optimization side more than is necessary).
ISTM we can do this by refactoring the warning bits so they're reusable
at different points in the pipeline. Those bits would always return a
boolean indicating if the given statement might generate a warning or
not.
When called early, they would not actually issue any warning. They
would merely do the best analysis they can and return a status
indicating whether or not the statement would generate a warning given
current context. The goal here is to leave statements that might
generate a warning as-is in the IL.
When called late (assuming there is a point where we can walk the IL and
issue the appropriate warnings), the routine would actually issue the
warning.
The kind of structure could potentially work for other builtins where we
may need to look at subsequent statements to avoid false positives, but
early folding hides cases by transforming the call into an undesirable
form.
Note that for cases where a call looks problematical early because we
can't see statement which stores the terminator, but where the
terminator statement ultimately becomes visible, we still get folding,
it just happens later in the pipeline.
Thoughts?
The warning only triggers when the bound is less than or equal
to the length of the constant source string (i.e, when strncpy
truncates). So IIUC, your suggestion would defer folding only
such strncpy calls and let gimple_fold_builtin_strncpy fold
those with a constant bound that's greater than the length of
the constant source string. That would be fine with me, but
since strncpy calls with a bound that's greater than the length
of the source are pointless I don't think they are important
enough to worry about folding super early. The constant ones
that serve any purpose (and that are presumably important to
optimize) are those that truncate.
That said, when optimization isn't enabled, I don't think users
expect calls to library functions to be transformed to calls to
other functions, or inlined. Yet that's just what GCC does.
For example, besides triggering the warning, the following:
char a[4];
void f (char *s)
{
__builtin_strncpy (a, "1234", sizeof a);
a[3] = 0;
}
is transformed, even at -O0, into:
f (char * s)
{
<bb 2> :
MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
a[3] = 0;
return;
}
That doesn't seem right. GCC should avoid these transformations
at -O0, and one way to do that is to defer folding until the CFG
is constructed. The patch does it for strncpy but a more general
solution would do that for all calls, e.g., in maybe_fold_stmt
as Richard suggested (and I subsequently tested).
Martin