Re: daily report on extending static analyzer project [GSoC]

David Malcolm via Gcc Fri, 16 Jul 2021 14:27:47 -0700

On Fri, 2021-07-16 at 21:04 +0530, Ankur Saini wrote:
> 
> 
> > On 15-Jul-2021, at 4:53 AM, David Malcolm <dmalc...@redhat.com>
> > wrote:
> > 
> > On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
> > 
> >


[...snip...]

> > 
> > > 
> > >   2. ( pr100546.c <   
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 < 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>>)
> > >   ```
> > >   #include <stdio.h>
> > >   #include <cstdlib.h>
> > >   
> > >   static void noReturn(const char *str) __attribute__((noreturn));
> > >   static void noReturn(const char *str) {
> > >       printf("%s\n", str);
> > >       exit(1);
> > >   }
> > >   
> > >   void (*noReturnPtr)(const char *str) = &noReturn;
> > >   
> > >   int main(int argc, char **argv) {
> > >       char *str = 0;
> > >       if (!str)
> > >           noReturnPtr(__FILE__);
> > >       return printf("%c\n", *str);
> > >   }
> > >   ```
> > >   (godbolt link <https://godbolt.org/z/aWfW51se3 < 
> > > https://godbolt.org/z/aWfW51se3>>)
> > > 
> > > - But at the time of testing ( command used 
> > >   was `make check-gcc RUNTESTFLAGS="-v -v
> > > analyzer.exp=pr100546.c"`),
> > > both of 
> > >   them failed unexpectedly with Segmentation fault at the call
> > > 
> > > - From further inspection, I found out that this is due 
> > >   "-fanalyzer-call-summaries" option, which looks like activats
> > > call
> > > summaries
> > > 
> > > - I would look into this in more details ( with gdb ) tomorrow,
> > > right
> > > now 
> > >   my guess is that this is either due too the changes I did in
> > > state-
> > > purge.cc <http://purge.cc/>
> > >   or is a call-summary related problem ( I remember it not being 
> > >   perfetly implemented right now). 
> > 
> > I'm not proud of the call summary code, so that may well be the
> > problem.
> > 
> > Are you able to use gdb on the analyzer?  It ought to be fairly
> > painless to identify where a segfault is happening, so let me know if
> > you're running into any problems with that.
> 
> Yes, I used gdb on the analyzer to go into details and looks like I was
> correct, the program was crashing in “analysis_plan::use_summary_p ()”
> on line 114 ( const cgraph_node *callee = edge->callee; ) where program
> was trying to access callgraph edge which didn’t exist .
> 
> I fixed it by simply making analyzer abort using call summaries in
> absence of callgraph edge.
> 
> File: {src-dir}/gcc/analyzer/analysis-plan.cc
> 
> 105: bool
> 106: analysis_plan::use_summary_p (const cgraph_edge *edge) const
> 107: {
> 108:   /* Don't use call summaries if -fno-analyzer-call-summaries.  */
> 109:   if (!flag_analyzer_call_summaries)
> 110:     return false;
> 111: 
> 112:   /* Don't use call summaries if there is no callgraph edge */
> 113:   if(!edge || !edge->callee)
> 114:     return false;
> 
> and now the tests are passing successfully. ( both manually and via
> DejaGnu ).

Great.

> 
> I have attached a sample patch of work done till now with this mail for
> review ( I haven’t sent this one to the patches list as it’s change log
> was not complete for now ).
> 
> P.S. I have also sent another mail (   
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html <  
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html> ) to
> patches list with the previous call-string patch and this time it
> popped up in my inbox as it should, did you also received it now ?

I can see it in the archive URL, but for some reason it's not showing
up in my inbox.  Bother.  Please can you try resending it directly to
me?

Putting email issues to one side, the patch you linked to above looks
good.  To what extent has it been tested?  If it bootstraps and passes
the test suite, it's ready for trunk.

Note that over the last couple of days I pushed my "use of
uninitialized values" detection work to trunk (aka master), along with
various other changes, so it's worth pulling master and rebasing on top
of that before testing.  I *think* we've been touching different parts
of the analyzer code, but there's a chance you might have to resolve
some merge conflicts.

As for the patch you attached to this email
  "[PATCH] analyzer: make analyer detect calls via function pointers"
here's an initial review:

> diff --git a/gcc/analyzer/analysis-plan.cc b/gcc/analyzer/analysis-plan.cc
> index 7dfc48e9c3e..1c7e4d2cc84 100644
> --- a/gcc/analyzer/analysis-plan.cc
> +++ b/gcc/analyzer/analysis-plan.cc
> @@ -109,6 +109,10 @@ analysis_plan::use_summary_p (const cgraph_edge *edge) 
> const
>    if (!flag_analyzer_call_summaries)
>      return false;
>  
> +  /* Don't use call summaries if there is no callgraph edge */
> +  if(!edge || !edge->callee)
> +    return false;

Is it possible for a cgraph_edge to have a NULL callee?  (I don't think
so, but I could be wrong)

Nit: missing space between "if" and open-paren

> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> index 7662a7f7bab..f45a614c0ab 100644
> --- a/gcc/analyzer/engine.cc
> +++ b/gcc/analyzer/engine.cc
> @@ -3170,10 +3170,13 @@ exploded_graph::process_node (exploded_node *node)
>        break;
>      case PK_AFTER_SUPERNODE:
>        {
> +        bool found_a_superedge = false;
> +        bool is_an_exit_block = false;
>       /* If this is an EXIT BB, detect leaks, and potentially
>          create a function summary.  */
>       if (point.get_supernode ()->return_p ())
>         {
> +         is_an_exit_block = true;
>           node->detect_leaks (*this);
>           if (flag_analyzer_call_summaries
>               && point.get_call_string ().empty_p ())
> @@ -3201,6 +3204,7 @@ exploded_graph::process_node (exploded_node *node)
>       superedge *succ;
>       FOR_EACH_VEC_ELT (point.get_supernode ()->m_succs, i, succ)
>         {
> +         found_a_superedge = true;
>           if (logger)
>             logger->log ("considering SN: %i -> SN: %i",
>                          succ->m_src->m_index, succ->m_dest->m_index);
> @@ -3210,19 +3214,100 @@ exploded_graph::process_node (exploded_node *node)
>                                                point.get_call_string ());
>           program_state next_state (state);
>           uncertainty_t uncertainty;
> +
> +         /* Check if now the analyzer know about the call via 
> +               function pointer or not. */
> +            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL && 
> +                !(succ->get_any_callgraph_edge()))

Some formatting nits:
- on a multiline conditional, our convention is to put the "&&" at the
start of a line, rather than the end
- missing space beween function name and open-paren in call.

Hence this should be formatted as:

            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL
                && !(succ->get_any_callgraph_edge ()))


> +              {    
> +                const program_point *this_point = &node->get_point();
> +                const program_state *this_state = &node->get_state ();
> +                const gcall *call 
> +                  = this_point->get_supernode ()->get_final_call ();
> +
> +                impl_region_model_context ctxt (*this,
> +                  node, 
> +                  this_state, 
> +                  &next_state, 
> +                  &uncertainty,
> +                  this_point->get_stmt());
> +
> +                region_model *model = this_state->m_region_model;
> +                tree fn_decl = model->get_fndecl_for_call(call,&ctxt);
> +                function *fun = DECL_STRUCT_FUNCTION(fn_decl);
> +                if(fun)
> +                {
> +                  const supergraph *sg = &(this->get_supergraph());

Might as well turn this into

> +                  const supergraph &sg = get_supergraph ();

and update various "sg->" into "sg." below.

> +                  supernode * sn_entry = sg->get_node_for_function_entry 
> (fun);
> +                  supernode * sn_exit = sg->get_node_for_function_exit (fun);
> +
> +                  program_point new_point 
> +                    = program_point::before_supernode (sn_entry,
> +                                                    NULL,
> +                                                    point.get_call_string 
> ());
> +
> +                  new_point.push_to_call_stack (sn_exit,
> +                                             next_point.get_supernode());
> +
> +                  next_state.push_call(*this, node, call, &uncertainty);

Some logging here could be helpful.


[...]

> +
> +    /* Return from the calls which doesn't have a return superedge.
> +     Such case occurs when GCC's middle end didn't knew which function to
> +     call but analyzer did */
> +    if((is_an_exit_block && !found_a_superedge) && 
> +       (!point.get_call_string().empty_p()))

Similar formatting nits as above; this should look something like:

    if ((is_an_exit_block && !found_a_superedge)
         && !point.get_call_string ().empty_p ())


[...snip...]

>  
> +/* do exatly what region_model::update_for_return_superedge() do
> +   but get the call info from CALL_STMT instead from a suerpedge and 
> +   is availabe publicically   */
> +void
> +region_model::update_for_return_gcall (const gcall *call_stmt,
> +                                            region_model_context *ctxt)
> +{
> +  /* Get the region for the result of the call, within the caller frame.  */
> +  const region *result_dst_reg = NULL;
> +  tree lhs = gimple_call_lhs (call_stmt);
> +  if (lhs)
> +    {
> +      /* Normally we access the top-level frame, which is:
> +         path_var (expr, get_stack_depth () - 1)
> +         whereas here we need the caller frame, hence "- 2" here.  */
> +      gcc_assert (get_stack_depth () >= 2);
> +      result_dst_reg = get_lvalue (path_var (lhs, get_stack_depth () - 2),
> +                                ctxt);
> +    }
> +
> +  pop_frame (result_dst_reg, NULL, ctxt);
> +}

I *think* this can be consolidated with update_for_return_superedge. 
Maybe instead, just rename that to update_for_return, and update the
callsite of update_for_return_superedge to get the call_stmt there.

That way we avoid copy&paste repetition of code.

I think something similar can be done for
region_model::update_for_gcall and update_for_call_superedge.

[...snip...]

> diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
> index 8611d0f8689..ace9e7b128a 100644
> --- a/gcc/analyzer/supergraph.cc
> +++ b/gcc/analyzer/supergraph.cc
> @@ -183,11 +183,34 @@ supergraph::supergraph (logger *logger)
>             m_stmt_to_node_t.put (stmt, node_for_stmts);
>             m_stmt_uids.make_uid_unique (stmt);
>             if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
> -             {
> -               m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
> -               node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), 
> NULL);
> -               m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
> -             }
> +             {
> +               m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
> +               node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt),
> +                                          NULL);
> +               m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
> +             }
> +             else
> +             {
> +               // maybe call is via a function pointer
> +               gcall *call = dyn_cast<gcall *> (stmt);
> +               if (call)

You can combine the if and the decl/dyn_cast into one line:

                  if (gcall *call = dyn_cast<gcall *> (stmt))

which saves a little vertical space and (slightly) limits the scope of
"call".

> +               {
> +                 cgraph_edge *edge 
> +                   = cgraph_node::get (fun->decl)->get_edge (stmt);
> +                 if (!edge || !edge->callee)
> +                 {
> +                   supernode *old_node_for_stmts = node_for_stmts;
> +                   node_for_stmts = add_node (fun, bb, call, NULL);
> +
> +                   superedge *sedge 
> +                     = new callgraph_superedge (old_node_for_stmts,
> +                                                node_for_stmts,
> +                                                
> SUPEREDGE_INTRAPROCEDURAL_CALL,
> +                                                NULL);
> +                   add_edge (sedge);
> +                 }
> +               }
> +             }
>           }

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c 
> b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
> new file mode 100644
> index 00000000000..1054ab4f240
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
> @@ -0,0 +1,53 @@
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +void fun(int *int_ptr)
> +{
> +     free(int_ptr);  /* { dg-warning "double-'free' of 'int_ptr'" } */
> +}
> +
> +void single_call()
> +{
> +     int *int_ptr = (int*)malloc(sizeof(int));
> +     void (*fun_ptr)(int *) = &fun;
> +     (*fun_ptr)(int_ptr);
> +}
> +
> +void double_call()
> +{
> +     int *int_ptr = (int*)malloc(sizeof(int));
> +     void (*fun_ptr)(int *) = &fun;
> +     (*fun_ptr)(int_ptr);
> +     (*fun_ptr)(int_ptr);
> +}
> +
> +/*{ dg-begin-multiline-output "" }

Normally the test suite injects "-fdiagnostics-plain-output" into the
options, which turns off the source code printing and ASCII art for
interprocedural paths.  If you want to test them via dg-begin-
multiline-output you'll need to have a look at the directives at the
top of one of the existing tests that uses dg-begin-multiline-output.

> +    6 |         free(int_ptr);
> +      |         ^~~~~~~~~~~~~
> +  'double_call': events 1-2
> +    |
> +    |   16 | void double_call()
> +    |      |      ^~~~~~~~~~~
> +    |      |      |
> +    |      |      (1) entry to 'double_call'
> +    |   17 | {
> +    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
> +    |      |                              ~~~~~~~~~~~~~~~~~~~
> +    |      |                              |
> +    |      |                              (2) allocated here
> +    |
> +    +--> 'fun': events 3-6
> +           |
> +           |    4 | void fun(int *int_ptr)
> +           |      |      ^~~
> +           |      |      |
> +           |      |      (3) entry to ‘fun’
> +           |      |      (5) entry to ‘fun’
> +           |    5 | {
> +           |    6 |         free(int_ptr);
> +           |      |         ~~~~~~~~~~~~~
> +           |      |         |
> +           |      |         (4) first 'free' here
> +           |      |         (6) second 'free' here; first 'free' was at (4)
> +           |
> +*/

and normally there would be a terminating:

  { dg-end-multiline-output "" } */

In any case, the output above is missing some events: I think ideally
it would show the calls from *fun_ptr to fun and the returns, giving
something like the following (which I mocked up by hand):

  'double_call': events 1-3
    |
    |   16 | void double_call()
    |      |      ^~~~~~~~~~~
    |      |      |
    |      |      (1) entry to 'double_call'
    |   17 | {
    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
    |      |                              ~~~~~~~~~~~~~~~~~~~
    |      |                              |
    |      |                              (2) allocated here
    | ...  |
    |   19 |         (*fun_ptr)(int_ptr);
    |      |         ^~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (3) calling 'fun' from 'double-call'
    |
    +--> 'fun': events 3-6
           |
           |    4 | void fun(int *int_ptr)
           |      |      ^~~
           |      |      |
           |      |      (4) entry to ‘fun’
           |    5 | {
           |    6 |         free(int_ptr);
           |      |         ~~~~~~~~~~~~~
           |      |         |
           |      |         (5) first 'free' here
           |
    <------+
    |
  'double_call': events 6-7
    |
    |   19 |         (*fun_ptr)(int_ptr);
    |      |         ^~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (6) returning to 'double-call' from 'fun'
    |   20 |         (*fun_ptr)(int_ptr);
    |      |         ^~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (7) calling 'fun' from 'double-call'
    |
    +--> 'fun': events 8-9
           |
           |    4 | void fun(int *int_ptr)
           |      |      ^~~
           |      |      |
           |      |      (8) entry to ‘fun’
           |    5 | {
           |    6 |         free(int_ptr);
           |      |         ~~~~~~~~~~~~~
           |      |         |
           |      |         (9) second 'free' here; first 'free' was at (5)

The events are created in diagnostic-manager.cc

Am I right in thinking that there's a interprocedural superedge for the
dynamically-discovered calls?

diagnostic_manager::add_events_for_superedge creates events for calls
and returns, so maybe you just need to update the case
SUPEREDGE_INTRAPROCEDURAL_CALL there, to do something for the
"dynamically discovered edge" cases (compare it with the other cases in
that function).   You might need to update the existing call_event and
return_event subclasses slightly (see checker-path.cc/h)

Ideally, event (7) above would read
  "passing freed pointer 'int_ptr' in call to 'fun from 'double_call"
but that's advanced material :)

[...snip...]

Hope this makes sense and is constructive
Dave

Re: daily report on extending static analyzer project [GSoC]

Reply via email to