[RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-23 Thread Jeff Law

Martin,

I'd like your thoughts on this patch.

One of the things I'm working on is changes that would allow passes that
use dominator walks to trivially perform context sensitive range
analysis as a part of their dominator walk.

As I outlined earlier this would allow us to easily fix the false
positive sprintf warning reported a week or two ago.

This patch converts the sprintf warning code to perform a dominator walk
rather than just walking the blocks in whatever order they appear in the
basic block array.

>From an implementation standpoint we derive a new class sprintf_dom_walk
from the dom_walker class.  Like other dom walkers we walk statements
from within the before_dom_children member function.  Very standard stuff.

I moved handle_gimple_call and various dependencies into the
sprintf_dom_walker class to facilitate calling handle_gimple_call from
within the before_dom_children member function.  There's light fallout
in various places where the call_info structure was explicitly expected
to be found in the pass_sprintf_length class, but is now found in the
sprintf_dom_walker class.

This has been bootstrapped and regression tested on x86_64-linux-gnu.
I've also layered my embedded VRP analysis on top of this work and
verified that it does indeed fix the reported false positive.

Thoughts?

Jeff






* gimple-ssa-sprintf.c: Include domwalk.h.
(class sprintf_dom_walker): New class, derived from dom_walker.
(sprintf_dom_walker::before_dom_children): New function.
(struct call_info): Moved into sprintf_dom_walker class
(compute_formath_length, handle_gimple_call): Likewise.
(sprintf_length::execute): Call the dominator walker rather
than walking the statements.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 9770df7..2223f24 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "toplev.h"
 #include "substring-locations.h"
 #include "diagnostic.h"
+#include "domwalk.h"
 
 /* The likely worst case value of MB_LEN_MAX for the target, large enough
for UTF-8.  Ideally, this would be obtained by a target hook if it were
@@ -113,6 +114,19 @@ static int warn_level;
 
 struct format_result;
 
+class sprintf_dom_walker : public dom_walker
+{
+ public:
+  sprintf_dom_walker () : dom_walker (CDI_DOMINATORS) {}
+  ~sprintf_dom_walker () {}
+
+  virtual edge before_dom_children (basic_block);
+  bool handle_gimple_call (gimple_stmt_iterator *);
+
+  struct call_info;
+  bool compute_format_length (call_info &, format_result *);
+};
+
 class pass_sprintf_length : public gimple_opt_pass
 {
   bool fold_return_value;
@@ -135,10 +149,6 @@ public:
   fold_return_value = param;
 }
 
-  bool handle_gimple_call (gimple_stmt_iterator *);
-
-  struct call_info;
-  bool compute_format_length (call_info &, format_result *);
 };
 
 bool
@@ -976,7 +986,7 @@ bytes_remaining (unsigned HOST_WIDE_INT navail, const 
format_result &res)
 
 /* Description of a call to a formatted function.  */
 
-struct pass_sprintf_length::call_info
+struct sprintf_dom_walker::call_info
 {
   /* Function call statement.  */
   gimple *callstmt;
@@ -2348,7 +2358,7 @@ format_plain (const directive &dir, tree)
should be diagnosed given the AVAILable space in the destination.  */
 
 static bool
-should_warn_p (const pass_sprintf_length::call_info &info,
+should_warn_p (const sprintf_dom_walker::call_info &info,
   const result_range &avail, const result_range &result)
 {
   if (result.max <= avail.min)
@@ -2419,7 +2429,7 @@ should_warn_p (const pass_sprintf_length::call_info &info,
 
 static bool
 maybe_warn (substring_loc &dirloc, location_t argloc,
-   const pass_sprintf_length::call_info &info,
+   const sprintf_dom_walker::call_info &info,
const result_range &avail_range, const result_range &res,
const directive &dir)
 {
@@ -2716,7 +2726,7 @@ maybe_warn (substring_loc &dirloc, location_t argloc,
in *RES.  Return true if the directive has been handled.  */
 
 static bool
-format_directive (const pass_sprintf_length::call_info &info,
+format_directive (const sprintf_dom_walker::call_info &info,
  format_result *res, const directive &dir)
 {
   /* Offset of the beginning of the directive from the beginning
@@ -3004,7 +3014,7 @@ format_directive (const pass_sprintf_length::call_info 
&info,
the directive.  */
 
 static size_t
-parse_directive (pass_sprintf_length::call_info &info,
+parse_directive (sprintf_dom_walker::call_info &info,
 directive &dir, format_result *res,
 const char *str, unsigned *argno)
 {
@@ -3431,7 +3441,7 @@ parse_directive (pass_sprintf_length::call_info &info,
that caused the processing to be terminated early).  */
 
 bool
-pass_sprintf_length::compute_format_length (call_info &info,
+sprintf_dom_walker::compute_format_length (call_i

Re: [RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-24 Thread Martin Sebor

On 10/23/2017 05:14 PM, Jeff Law wrote:


Martin,

I'd like your thoughts on this patch.

One of the things I'm working on is changes that would allow passes that
use dominator walks to trivially perform context sensitive range
analysis as a part of their dominator walk.

As I outlined earlier this would allow us to easily fix the false
positive sprintf warning reported a week or two ago.

This patch converts the sprintf warning code to perform a dominator walk
rather than just walking the blocks in whatever order they appear in the
basic block array.

From an implementation standpoint we derive a new class sprintf_dom_walk
from the dom_walker class.  Like other dom walkers we walk statements
from within the before_dom_children member function.  Very standard stuff.

I moved handle_gimple_call and various dependencies into the
sprintf_dom_walker class to facilitate calling handle_gimple_call from
within the before_dom_children member function.  There's light fallout
in various places where the call_info structure was explicitly expected
to be found in the pass_sprintf_length class, but is now found in the
sprintf_dom_walker class.

This has been bootstrapped and regression tested on x86_64-linux-gnu.
I've also layered my embedded VRP analysis on top of this work and
verified that it does indeed fix the reported false positive.

Thoughts?


If it lets us improve the quality of the range information I can't
think of a downside.

Besides the sprintf pass, a number of other areas depend on ranges,
most of all the -Wstringop-overflow and truncation warnings and
now -Wrestrict (once my enhancement is approved).  It would be nice
to be able to get the same improvements there.  Does it mean that
those warnings will need to be moved into a standalone pass?  (I'm
not opposed to it, just wondering what to expect if this is
the route we want to go.)

Martin


Re: [RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-25 Thread Jeff Law
On 10/24/2017 11:35 AM, Martin Sebor wrote:
> On 10/23/2017 05:14 PM, Jeff Law wrote:
>>
>> Martin,
>>
>> I'd like your thoughts on this patch.
>>
>> One of the things I'm working on is changes that would allow passes that
>> use dominator walks to trivially perform context sensitive range
>> analysis as a part of their dominator walk.
>>
>> As I outlined earlier this would allow us to easily fix the false
>> positive sprintf warning reported a week or two ago.
>>
>> This patch converts the sprintf warning code to perform a dominator walk
>> rather than just walking the blocks in whatever order they appear in the
>> basic block array.
>>
>> From an implementation standpoint we derive a new class sprintf_dom_walk
>> from the dom_walker class.  Like other dom walkers we walk statements
>> from within the before_dom_children member function.  Very standard
>> stuff.
>>
>> I moved handle_gimple_call and various dependencies into the
>> sprintf_dom_walker class to facilitate calling handle_gimple_call from
>> within the before_dom_children member function.  There's light fallout
>> in various places where the call_info structure was explicitly expected
>> to be found in the pass_sprintf_length class, but is now found in the
>> sprintf_dom_walker class.
>>
>> This has been bootstrapped and regression tested on x86_64-linux-gnu.
>> I've also layered my embedded VRP analysis on top of this work and
>> verified that it does indeed fix the reported false positive.
>>
>> Thoughts?
> 
> If it lets us improve the quality of the range information I can't
> think of a downside.
It's potentially slower simply because the domwalk interface is more
expensive than just iterating over the blocks with FOR_EACH_BB.  But
that's about it.  I think the ability to get more accurate range
information will make the compile-time hit worth it.

> 
> Besides the sprintf pass, a number of other areas depend on ranges,
> most of all the -Wstringop-overflow and truncation warnings and
> now -Wrestrict (once my enhancement is approved).  It would be nice
> to be able to get the same improvements there.  Does it mean that
> those warnings will need to be moved into a standalone pass?  (I'm
> not opposed to it, just wondering what to expect if this is
> the route we want to go.)
They don't necessarily have to be a standalone pass -- they just have to
be implementable as part of a dominator walk to get the cheap context
sensitive range data.

So IIRC you've got some code to add additional warnings within the
strlen pass.  That pass is already a dominator walk.  In theory you'll
just add a member to the strlen_dom_walker class, then a call in
before_dom_children and after_dom_children virtuals and you should be
able to query the context sensitive range information.

For warnings that occur in code that is not easily structured as a
dominator walk, Andrew's work will definitely be a better choice.

Andrew's work will almost certainly also generate even finer grained
ranges because it can work on an arbitrary path through the CFG rather
than relying on dominance relationships.  Consider

A
   / \
  B   C
   \ /
D

Range information implied by the edge A->B is usable within B because
the edge A->B dominates B.  Similarly for range information implied by
A->C being available in C.  But range information implied by A->B is not
available in D because A->B does not dominate D.  SImilarly range
information implied by A->C is not available in D.

I touched on this in a private message recently.  Namely that exploiting
range data in non-dominated blocks feels a *lot* like jump threading and
should likely be structured as a backwards walk query (and thus is more
suitable for Andrew's infrastructure).


jeff


Re: [RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-26 Thread Richard Biener
On Wed, Oct 25, 2017 at 5:44 PM, Jeff Law  wrote:
> On 10/24/2017 11:35 AM, Martin Sebor wrote:
>> On 10/23/2017 05:14 PM, Jeff Law wrote:
>>>
>>> Martin,
>>>
>>> I'd like your thoughts on this patch.
>>>
>>> One of the things I'm working on is changes that would allow passes that
>>> use dominator walks to trivially perform context sensitive range
>>> analysis as a part of their dominator walk.
>>>
>>> As I outlined earlier this would allow us to easily fix the false
>>> positive sprintf warning reported a week or two ago.
>>>
>>> This patch converts the sprintf warning code to perform a dominator walk
>>> rather than just walking the blocks in whatever order they appear in the
>>> basic block array.
>>>
>>> From an implementation standpoint we derive a new class sprintf_dom_walk
>>> from the dom_walker class.  Like other dom walkers we walk statements
>>> from within the before_dom_children member function.  Very standard
>>> stuff.
>>>
>>> I moved handle_gimple_call and various dependencies into the
>>> sprintf_dom_walker class to facilitate calling handle_gimple_call from
>>> within the before_dom_children member function.  There's light fallout
>>> in various places where the call_info structure was explicitly expected
>>> to be found in the pass_sprintf_length class, but is now found in the
>>> sprintf_dom_walker class.
>>>
>>> This has been bootstrapped and regression tested on x86_64-linux-gnu.
>>> I've also layered my embedded VRP analysis on top of this work and
>>> verified that it does indeed fix the reported false positive.
>>>
>>> Thoughts?
>>
>> If it lets us improve the quality of the range information I can't
>> think of a downside.
> It's potentially slower simply because the domwalk interface is more
> expensive than just iterating over the blocks with FOR_EACH_BB.  But
> that's about it.  I think the ability to get more accurate range
> information will make the compile-time hit worth it.
>
>>
>> Besides the sprintf pass, a number of other areas depend on ranges,
>> most of all the -Wstringop-overflow and truncation warnings and
>> now -Wrestrict (once my enhancement is approved).  It would be nice
>> to be able to get the same improvements there.  Does it mean that
>> those warnings will need to be moved into a standalone pass?  (I'm
>> not opposed to it, just wondering what to expect if this is
>> the route we want to go.)
> They don't necessarily have to be a standalone pass -- they just have to
> be implementable as part of a dominator walk to get the cheap context
> sensitive range data.
>
> So IIRC you've got some code to add additional warnings within the
> strlen pass.  That pass is already a dominator walk.  In theory you'll
> just add a member to the strlen_dom_walker class, then a call in
> before_dom_children and after_dom_children virtuals and you should be
> able to query the context sensitive range information.
>
> For warnings that occur in code that is not easily structured as a
> dominator walk, Andrew's work will definitely be a better choice.
>
> Andrew's work will almost certainly also generate even finer grained
> ranges because it can work on an arbitrary path through the CFG rather
> than relying on dominance relationships.  Consider
>
> A
>/ \
>   B   C
>\ /
> D
>
> Range information implied by the edge A->B is usable within B because
> the edge A->B dominates B.  Similarly for range information implied by
> A->C being available in C.  But range information implied by A->B is not
> available in D because A->B does not dominate D.  SImilarly range
> information implied by A->C is not available in D.
>
> I touched on this in a private message recently.  Namely that exploiting
> range data in non-dominated blocks feels a *lot* like jump threading and
> should likely be structured as a backwards walk query (and thus is more
> suitable for Andrew's infrastructure).

On the contrary - with a backward walk you don't know which way to go.
>From D to B or to C?  With a forward walk there's no such ambiguity
(unless you start from A).

Note I have patches for EVRP merging ranges from B and C to make
the info available for D but usually there's nothing to recover here
that isn't also valid in A.  Just ranges derived from non-conditional
stmts (by means of exploiting undefined behavior) can help here.

Richard.

>
> jeff


Re: [RFA][PATCH] Convert sprintf warning code to use a dominator walk

2017-10-26 Thread Jeff Law
On 10/26/2017 03:09 AM, Richard Biener wrote:
> On Wed, Oct 25, 2017 at 5:44 PM, Jeff Law  wrote:
>> On 10/24/2017 11:35 AM, Martin Sebor wrote:
>>> On 10/23/2017 05:14 PM, Jeff Law wrote:

 Martin,

 I'd like your thoughts on this patch.

 One of the things I'm working on is changes that would allow passes that
 use dominator walks to trivially perform context sensitive range
 analysis as a part of their dominator walk.

 As I outlined earlier this would allow us to easily fix the false
 positive sprintf warning reported a week or two ago.

 This patch converts the sprintf warning code to perform a dominator walk
 rather than just walking the blocks in whatever order they appear in the
 basic block array.

 From an implementation standpoint we derive a new class sprintf_dom_walk
 from the dom_walker class.  Like other dom walkers we walk statements
 from within the before_dom_children member function.  Very standard
 stuff.

 I moved handle_gimple_call and various dependencies into the
 sprintf_dom_walker class to facilitate calling handle_gimple_call from
 within the before_dom_children member function.  There's light fallout
 in various places where the call_info structure was explicitly expected
 to be found in the pass_sprintf_length class, but is now found in the
 sprintf_dom_walker class.

 This has been bootstrapped and regression tested on x86_64-linux-gnu.
 I've also layered my embedded VRP analysis on top of this work and
 verified that it does indeed fix the reported false positive.

 Thoughts?
>>>
>>> If it lets us improve the quality of the range information I can't
>>> think of a downside.
>> It's potentially slower simply because the domwalk interface is more
>> expensive than just iterating over the blocks with FOR_EACH_BB.  But
>> that's about it.  I think the ability to get more accurate range
>> information will make the compile-time hit worth it.
>>
>>>
>>> Besides the sprintf pass, a number of other areas depend on ranges,
>>> most of all the -Wstringop-overflow and truncation warnings and
>>> now -Wrestrict (once my enhancement is approved).  It would be nice
>>> to be able to get the same improvements there.  Does it mean that
>>> those warnings will need to be moved into a standalone pass?  (I'm
>>> not opposed to it, just wondering what to expect if this is
>>> the route we want to go.)
>> They don't necessarily have to be a standalone pass -- they just have to
>> be implementable as part of a dominator walk to get the cheap context
>> sensitive range data.
>>
>> So IIRC you've got some code to add additional warnings within the
>> strlen pass.  That pass is already a dominator walk.  In theory you'll
>> just add a member to the strlen_dom_walker class, then a call in
>> before_dom_children and after_dom_children virtuals and you should be
>> able to query the context sensitive range information.
>>
>> For warnings that occur in code that is not easily structured as a
>> dominator walk, Andrew's work will definitely be a better choice.
>>
>> Andrew's work will almost certainly also generate even finer grained
>> ranges because it can work on an arbitrary path through the CFG rather
>> than relying on dominance relationships.  Consider
>>
>> A
>>/ \
>>   B   C
>>\ /
>> D
>>
>> Range information implied by the edge A->B is usable within B because
>> the edge A->B dominates B.  Similarly for range information implied by
>> A->C being available in C.  But range information implied by A->B is not
>> available in D because A->B does not dominate D.  SImilarly range
>> information implied by A->C is not available in D.
>>
>> I touched on this in a private message recently.  Namely that exploiting
>> range data in non-dominated blocks feels a *lot* like jump threading and
>> should likely be structured as a backwards walk query (and thus is more
>> suitable for Andrew's infrastructure).
> 
> On the contrary - with a backward walk you don't know which way to go.
> From D to B or to C?  With a forward walk there's no such ambiguity
> (unless you start from A).
> 
> Note I have patches for EVRP merging ranges from B and C to make
> the info available for D but usually there's nothing to recover here
> that isn't also valid in A.  Just ranges derived from non-conditional
> stmts (by means of exploiting undefined behavior) can help here.
My point is the range information that is specific to the A->B edge is
not usable to optimize D because it does not hold on all paths to D.
But it can be used to improve preciseness of warnings for code within D.


It's certainly possible to merge the range information for the A->B edge
and the information for the A->C edge as we enter D (in the original
graph).  For range data implied by the edge traversal I suspect what you
end up is rarely, if ever better than what we had in A.  But if there
are range generating