Re: Constraints and branching in -fanalyzer

2021-02-20 Thread brian.sobulefsky via Gcc
To be clear, I only solved the lesser problem

if(idx-- > 0)
  __analyzer_eval(idx >= 0);

which is a stepping stone problem. I correctly surmised that this was failing
(with the prefix operator and -= operator working as expected) because the
condition that is constrainted in the postfix problem is the old value for idx
while the condition being evaluated is the new value. I can send you a patch,
but the short version is the initial value of idx is constrained, then a 
binop_svalue
is stored and eventually ends up in __analyzer_eval. Adding a case in
constraint_manager::eval_condition to take apart binop svalues and recur
the way you are imagining would be necessary is basically all that is needed
to solve that one. Currently, the constraint_manager is just looking at
that binop_svalue and determining it does not know any rules for it, because
the rule it knows about is actually for one of its arguments.

I was hoping this would be it for the original loop problem, but like I said,
it goes from saying "UNKNOWN" twice to saying "TRUE UNKNOWN" which I
found out happens for the other operators in a for loop as well. The first
true is my binop_svalue handler, but the second UNKNOWN is the merging of
the bottom of the loop back with the entry point.

Since that is fairly abstract, when I found the case I told you about,
I decided to see if I could fix it, because merging >0 with =0 into >=0
for a linear CFG should not be too hard.



Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Saturday, February 20, 2021 12:42 PM, David Malcolm  
wrote:

> [Moving this discussion from offlist to the GCC mailing list (with
> permission) and tweaking the subject]
>
> On Sat, 2021-02-20 at 02:57 +, brian.sobulefsky wrote:
>
> > Yeah, its a lot to take in. For the last one, it was just about
> > storing and retrieving data and I ignored everything else about the
> > analyzer, and that was hard enough.
>
> Well done on making it this far; I'm impressed that you're diving into
> some of the more difficult aspects of this code, and seem to be coping.
>
> > I am working on PR94362, which originates from a false positive found
> > compiling openssl. It effectivly amounted to not knowing that idx >=
> > 0 within the loop for(; idx-- >0 ;).
> > It turns out there are two problems here. One has to do with the
> > postfix operator, and yes, the analyzer currently does not know that
> > i >= 0 within an if block like if(idx-- > 0). That problem was easy
> > and I got it to work within a few days with a relatively simple
> > patch. I thought I was going to be submitting again.
> > The second part is hard. It has to do with state merging and has
> > nothing to do with the postfix operator. It fails for all sorts of
> > operators when looping. In fact, the following fails:
> > if(idx < 0)
> >   idx = 0;
> > __analyzer_eval(idx >= 0);
> > which is devastating if you are hoping the analyzer can "understand"
> > a loop. Even with my first fix (which gives one TRUE when run on a
> > for loop), there is the second "iterated" pass, which ends up with a
> > widening_svalue (I'm still trying to wrap my head around that one
> > too), that gives an UNKNOWN
>
> FWIW "widening" in this context is taken from abstract interpretation;
> see e.g. the early papers by Patrick and Radhia Cousot; the idea is to
> jump ahead of an infinitely descending chain of values to instead go
> directly to a fixed point in a (small) finite number of steps. (I've
> not attempted the narrowing approach, which refines it further to get a
> tighter approximation).
>
> > So I am trying to follow how states are merged, but that means I need
> > to at least sort of understand the graphs. I do know that the actual
> > merging follows in the PK_AFTER_SUPERNODE branch, with the call to
> > node->on_edge, which eventually gets you to maybe_update_for_edge and
> > the for_each_fact iterator.
>
> I have spent far too many hours poring over graph dumps from the
> analyzer, and yes, grokking the state merging is painful, and I'm sure
> there are many bugs.
>
> Are you familiar with the various dump formats for the graph? In
> particular the .dot ones? FWIW I use xdot.py for viewing them:
> https://github.com/jrfonseca/xdot.py
> (and indeed am the maintainer of the Fedora package for it); it has a
> relatively quick and scalable UI for navigating graphs, but at some
> point even it can't cope.
> I started writing a dedicated visualizer that uses some of xdot.py's
> classes:
> https://github.com/davidmalcolm/gcc-analyzer-viewer
> but it's early days for that.
>
> > I watched a merge in the debugger yesterday for the if example above
> > and watched the unknown_svalues be made for the merged state, but it
> > was still too much to take in all at once for me to know where the
> > solution is.
>
> One other nasty problem with the state merging code is that any time I
> touch it, there are knock-on effects where other things break (e.g.
> loop analysis stops conve

gcc-10-20210220 is now available

2021-02-20 Thread GCC Administrator via Gcc
Snapshot gcc-10-20210220 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20210220/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision 82560ad9d0079777f2298ec65089dec7f03b05db

You'll find:

 gcc-10-20210220.tar.xz   Complete GCC

  SHA256=5949ac57af3dd740cef6723b687e1f1d8cf46e88ce60f68d365ba7d2cf6d1d4e
  SHA1=142ede3dd5c3589384bd52be109ec2e8a6277f9c

Diffs from 10-20210213 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Constraints and branching in -fanalyzer

2021-02-20 Thread David Malcolm via Gcc
[Moving this discussion from offlist to the GCC mailing list (with
permission) and tweaking the subject]

On Sat, 2021-02-20 at 02:57 +, brian.sobulefsky wrote:
> Yeah, its a lot to take in. For the last one, it was just about
> storing and retrieving data and I ignored everything else about the
> analyzer, and that was hard enough.

Well done on making it this far; I'm impressed that you're diving into
some of the more difficult aspects of this code, and seem to be coping.

> I am working on PR94362, which originates from a false positive found
> compiling openssl. It effectivly amounted to not knowing that idx >=
> 0 within the loop for(; idx-- >0 ;).
> 
> It turns out there are two problems here. One has to do with the
> postfix operator, and yes, the analyzer currently does not know that
> i >= 0 within an if block like if(idx-- > 0). That problem was easy
> and I got it to work within a few days with a relatively simple
> patch. I thought I was going to be submitting again.
> 
> The second part is hard. It has to do with state merging and has
> nothing to do with the postfix operator. It fails for all sorts of
> operators when looping. In fact, the following fails:
> 
> if(idx < 0)
>   idx = 0;
> __analyzer_eval(idx >= 0);
> 
> which is devastating if you are hoping the analyzer can "understand"
> a loop. Even with my first fix (which gives one TRUE when run on a
> for loop), there is the second "iterated" pass, which ends up with a
> widening_svalue (I'm still trying to wrap my head around that one
> too), that gives an UNKNOWN

FWIW "widening" in this context is taken from abstract interpretation;
see e.g. the early papers by Patrick and Radhia Cousot; the idea is to
jump ahead of an infinitely descending chain of values to instead go
directly to a fixed point in a (small) finite number of steps.  (I've
not attempted the narrowing approach, which refines it further to get a
tighter approximation).


> So I am trying to follow how states are merged, but that means I need
> to at least sort of understand the graphs. I do know that the actual
> merging follows in the PK_AFTER_SUPERNODE branch, with the call to
> node->on_edge, which eventually gets you to maybe_update_for_edge and
> the for_each_fact iterator.

I have spent far too many hours poring over graph dumps from the
analyzer, and yes, grokking the state merging is painful, and I'm sure
there are many bugs.

Are you familiar with the various dump formats for the graph?  In
particular the .dot ones?  FWIW I use xdot.py for viewing them:
  https://github.com/jrfonseca/xdot.py
(and indeed am the maintainer of the Fedora package for it); it has a
relatively quick and scalable UI for navigating graphs, but at some
point even it can't cope.
I started writing a dedicated visualizer that uses some of xdot.py's
classes:
  https://github.com/davidmalcolm/gcc-analyzer-viewer
but it's early days for that.



> I watched a merge in the debugger yesterday for the if example above
> and watched the unknown_svalues be made for the merged state, but it
> was still too much to take in all at once for me to know where the
> solution is.

One other nasty problem with the state merging code is that any time I
touch it, there are knock-on effects where other things break (e.g.
loop analysis stops converging), and as I fix those, yet more things
break, which is demoralizing (3 steps forward, 2 steps back).

Finding ways to break problems down into smaller chunks seems to be the
key here.

It sounds like you're making progress with the:

  if (idx < 0)
 idx = 0;
  __analyzer_eval (idx >= 0);

case.  Does your fix at least work outside of a loop, without
regressing things? (Or, if it does, what regresses?)  If so, then it
could be turned into a minimal testcase and we could at least fix that.


FWIW I've experimented with better ways to handle loops in the
analyzer.  One approach is that GCC already has its own loop analysis
framework.  At the point where -fanalyzer runs, the IR has captured the
nesting structure of loops in the code, so we might want to make use of
that in our heuristics.  Sadly, as far as I can tell, any attempts to
go beyond that and reuse GCC's scalar-value-evolution code (for
handling loop bounds and iterations) require us to enable modification
of the CFGs, which is a no-no for -fanalyzer.

(Loop handling is one of the most important and difficult issues within
the analyzer implementation.  That said, in the last few days I've been
ignoring it, and have been focusing instead on a rewrite of how we find
the shortest feasibile path for each diagnostic, since there's another
cluster of analyzer bugs relating to false negatives in that; I hope to
land a big fix for feasibilty handling next week - and to finish
reviewing your existing patch (sorry!))

Hope this is helpful.  I'm quoting the rest of our exchange below (for
the mailing list)

Dave


> 
> Sent with ProtonMail Secure Email.
> 
> ‐‐‐ Original Message ‐‐‐
> On Friday, Februar

thanx

2021-02-20 Thread Steve via Gcc
I just tried to use microsoft's visual C++ and it will no longer compile 
a simple c program. I cannot thank your team enough for keeping the art 
of programming available to those who want to learn.



thanks for avr-gcc, prop-gcc, and of course gcc! thanx for preserving my 
freedom to program!








Re: using undeclared function returning bool results in wrong return value

2021-02-20 Thread David Brown



On 20/02/2021 16:46, David Malcolm wrote:
> On Sat, 2021-02-20 at 15:25 +0100, David Brown wrote:


> 
> I think we need to think about both of these use-cases e.g. as we
> implement our diagnostics, and that we should mention this distinction
> in our UX guidelines...
> 
>> Is it possible to distinguish these uses, and then have different
>> default flags?  Perhaps something as simple as looking at the name
>> used
>> to call the compiler - "cc" or "gcc" ?
>>
> 
> ...but I'm wary of having an actual distinction between them in the
> code; it seems like a way to complicate things and lead to "weird"
> build failures.
> 

Fair enough.

> Thought experiment: what might a "--this-is-my-code" option do?
> 

It should read the programmer's mind and tell them of any discrepancies
between what they wrote and what they meant :-)

I'd say it should make "-Wall" the default, and complain if "-std" is
not specified explicitly, and if there is no "-O" flag (or a #pragma GCC
optimise early in the code - even if it is an explicit -O0).  That would
cover things that I see people getting wrong regularly.

(I am a big fan of explicit rather than implicit, and having default
behaviour be a complaint that you are relying on default behaviour.  But
I may not be a typical user.)

> Hope this is constructive
> Dave
> 


Re: using undeclared function returning bool results in wrong return value

2021-02-20 Thread David Malcolm via Gcc
On Sat, 2021-02-20 at 15:25 +0100, David Brown wrote:
> On 19/02/2021 12:18, Jonathan Wakely via Gcc wrote:
> > On Fri, 19 Feb 2021 at 09:42, David Brown wrote:
> > > Just to be clear - I am not in any way suggesting that this
> > > situation is
> > > the fault of any gcc developers.  If configure scripts are
> > > failing
> > > because they rely on poor C code or inappropriate use of gcc
> > > (code that
> > > requires a particular C standard should specify it - gcc has the
> > > "-std="
> > > flags for that purpose), then the maintainers of those scripts
> > > should
> > > fix them.  If Fedora won't build just because the C compiler
> > > insists C
> > > code is written in C, then the Fedora folk need to fix their
> > > build system.
> > 
> > It's not Fedora's build system, it's the packages in Fedora's build
> > systems. Lots of them. And those same packages are in every other
> > Linux distro, so everybody needs to fix them.
> > 
> 
> It seems to me that there are two very different uses of gcc going on
> here.  (I'm just throwing up some ideas here - if people think they
> are
> daft, wrong or impractical, feel free to throw them out again!  I am
> trying to think of ways to make it easier for people to see that
> there
> are problems with their C or C++ code, without requiring impractical
> changes on large numbers of configuration files and build setups.)
> 
> gcc can be used as a development tool - it is an aid when writing
> code,
> and helps you write better code.  Here warnings of all sorts are
> useful,
> as it is better to find potential or real problems as early as
> possible
> in the development process.  Even warnings about style are important
> because they improve the long-term maintainability of the code.
> 
> gcc can also be used to build existing code - for putting together
> distributions, installing on your own machine, etc.  Here flags such
> as
> "-march=native" can be useful but non-critical warnings are not,
> because
> the person (or program) running the compiler is not a developer of
> the
> code.  This use is as a "system C compiler".

I think there's an important insight here, in that there's a
distinction between:

(a) the edit-compile-debug cycle where the user is actively hacking on
the code themself (perhaps a project they wrote, or someone else's),
where they just made a change to the code and want to see what
happens, 

as opposed to

(b) a batch rebuild setting, where the user is recompiling a package,
and GCC is a detail that's being invoked by a hierarachy of build
systems (e.g. a Fedora mass rebuild that invokes koji, that invokes
rpmbuild, that invokes some build tool, which eventually invokes gcc);
perhaps a dependency changed, and the user is curious about what breaks
(and hoping that nothing does, since they know nothing about this
particular code, maybe they're just trying to get the distro to boot on
some new architecture).

I think we need to think about both of these use-cases e.g. as we
implement our diagnostics, and that we should mention this distinction
in our UX guidelines...

> Is it possible to distinguish these uses, and then have different
> default flags?  Perhaps something as simple as looking at the name
> used
> to call the compiler - "cc" or "gcc" ?
> 

...but I'm wary of having an actual distinction between them in the
code; it seems like a way to complicate things and lead to "weird"
build failures.

Thought experiment: what might a "--this-is-my-code" option do?

Hope this is constructive
Dave



Re: using undeclared function returning bool results in wrong return value

2021-02-20 Thread David Brown
On 19/02/2021 12:18, Jonathan Wakely via Gcc wrote:
> On Fri, 19 Feb 2021 at 09:42, David Brown wrote:
>> Just to be clear - I am not in any way suggesting that this situation is
>> the fault of any gcc developers.  If configure scripts are failing
>> because they rely on poor C code or inappropriate use of gcc (code that
>> requires a particular C standard should specify it - gcc has the "-std="
>> flags for that purpose), then the maintainers of those scripts should
>> fix them.  If Fedora won't build just because the C compiler insists C
>> code is written in C, then the Fedora folk need to fix their build system.
> 
> It's not Fedora's build system, it's the packages in Fedora's build
> systems. Lots of them. And those same packages are in every other
> Linux distro, so everybody needs to fix them.
> 

It seems to me that there are two very different uses of gcc going on
here.  (I'm just throwing up some ideas here - if people think they are
daft, wrong or impractical, feel free to throw them out again!  I am
trying to think of ways to make it easier for people to see that there
are problems with their C or C++ code, without requiring impractical
changes on large numbers of configuration files and build setups.)

gcc can be used as a development tool - it is an aid when writing code,
and helps you write better code.  Here warnings of all sorts are useful,
as it is better to find potential or real problems as early as possible
in the development process.  Even warnings about style are important
because they improve the long-term maintainability of the code.

gcc can also be used to build existing code - for putting together
distributions, installing on your own machine, etc.  Here flags such as
"-march=native" can be useful but non-critical warnings are not, because
the person (or program) running the compiler is not a developer of the
code.  This use is as a "system C compiler".

Is it possible to distinguish these uses, and then have different
default flags?  Perhaps something as simple as looking at the name used
to call the compiler - "cc" or "gcc" ?


Re: A working GIMPLE simple IPA case to run?

2021-02-20 Thread Shuai Wang via Gcc
Thank you very much! Just a follow-up question regarding IPA.

Currently I can follow the tree-profile.c sample to perform IPA. However,
my analysis is limited to all functions within one .c file. Is it possible
for me to do cross- .c file analysis? That is, suppose there is a function
foo in source1.c called function bar in source2.c. How can I cross analyze
foo then bar?

Thank you!

Best,
Shuai



On Wed, Feb 17, 2021 at 4:04 PM Martin Liška  wrote:

> On 2/17/21 5:21 AM, Shuai Wang via Gcc wrote:
> > Could anyone shed some light on this? Thank you very much!
>
> Hello.
>
> I would recommend looking at any of the existing passes:
> $ git grep SIMPLE_IPA_PASS
> ...
>
> One reasonable example can be gcc/tree-profile.c.
>
> Cheers,
> Martin
>