Re: daily report on extending static analyzer project [GSoC]

David Malcolm via Gcc Fri, 30 Jul 2021 07:49:01 -0700

On Fri, 2021-07-30 at 18:11 +0530, Ankur Saini wrote:
> 
> 
> > On 30-Jul-2021, at 5:35 AM, David Malcolm <dmalc...@redhat.com>
> > wrote:
> > 
> > On Thu, 2021-07-29 at 18:20 +0530, Ankur Saini wrote:


[..snip...]
> > > 
> 
> > 
> > > @@ -1242,6 +1243,17 @@ exploded_node::on_stmt (exploded_graph &eg,
> > >         unknown_side_effects = false;
> > >     }
> > > 
> > > +  /* If the statmement is a polymorphic call then assume 
> > > +     there are no side effects.  */
> > > +  gimple *call_stmt = const_cast<gimple *>(stmt);
> > > +  if (gcall *call = dyn_cast<gcall *> (call_stmt))
> > > +  {
> > > +    function *fun = this->get_function();
> > > +    cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge
> > > (call);
> > > +    if ((e && e->indirect_info) && (e->indirect_info-
> > > >polymorphic))
> > > +        unknown_side_effects = false;
> > > +  }
> > > +
> > 
> > This seems wrong; surely it depends on what the call is - or am I
> > missing something?  Is the issue that we're speculating lots of
> > possibilities as dynamic calls?  If so, would it be better to
> > terminate
> > the remaining analysis path (if that makes sense), and assume that
> > any
> > further analysis happens on extra edges added for the speculated
> > calls?
> 
> Actually the issue here was, the analyzer was not able to find the body
> of callee function here and was treating all polymorphic calls as “call
> to unknown functions” and resetting all of the state machines and not
> generating desired diagnostics. I just changed it to assume there is no
> side effect of the call right now. 
> should I maybe eventually check for the unknown call later ( when the
> anayzer knows which function it is calling ) ? 

I'm not sure.  I think it at least warrants a "FIXME" style comment in
the code above.  It also suggests some more test cases, to cover the
calls does have side effects vs call doesn't have side effects: e.g. a
case where, say, A:foo() modifies a global variable, and B::foo()
doesn't; and something like:
  int saved_g = g;
  a->foo ();
  __analyzer_eval (saved_g == g);
could be used in DejaGnu to see if we know what got called.


> 
> > 
> > FWIW I've been experimenting with adding "bifurcation" support so
> > that
> > you can do:
> >  program_state *other = ctxt->bifurcate ();
> > and have it split the analysis into states (e.g. for handling
> > realloc,
> > so that we can split into 3 states: "succeeded", "succeeded but
> > moved",
> > "failed").  Unfortunately my code for this is a mess (it's a hacked
> > up
> > prototype).  Should I try to post what I have for this?
> 
> This surely looks like a thing which the project can take advantage of,
> maybe by bifurcating the analysis at virtual function calls. 

(nods)

I have some non-analyzer tasks to focus on today, so I'm not going to
have that code ready until next week.

[..snip...]

> > 
> > If we're speculating that a particular call happens, do we gain
> > information about what kind of object we're dealing with?
> 
> I think I used the wrong wording there, we are not quite "speculating”
> the calls, but are calling and analysing them instead.
> 
> > 
> > If we have a repeated call to the same vfunc, do we know that we're
> > calling the same function?
> 
> yes, this is a problem I didn’t foresee. When a call can have multiple
> targets, the analyser acts as if all of those functions are called from
> that point and then on second function call, it will do the same,
> leading to some weird diagnostic results. Looks like this is where the
> "bifurcation” might come in handy
> 
> But for a lot of simple cases, apparently the
> possible_polymorphic_call_targets () do the job good enough to only
> give out a single target accurately ( even when a base class pointer is
> used )
> 
> > 
> > [...snip...]
> > 
> > I'm interested in seeing what test cases you have.
> 
> I was testing it on a couple simple test programs ( which I think are
> not enough now ). 
> - - - 
> 1.  https://godbolt.org/z/qboq35bar <https://godbolt.org/z/qboq35bar> (
> analyser doesn’t generate any warnings as this one was just to check if
> the analyser is detecting calls correctly or not  )
> 
> super graph and exploded graph :-

FWIW some of these examples are hitting the complexity limit in
eg_traits::dump_args_t::show_enode_details_p and thus not showing the
details, (which is probably a good thing; we don't want to be sending
huge attachments to the list).  Unfortunately, beyond a certain point,
the .dot dumps get unreadable.

Looking at test_1.cpp.eg.dot I see that after the constructor runs, the
state has e.g.:

  cluster for A a: (&constexpr int (* A::_ZTV1A [4])(...)+(sizetype)16)

which I think means that we "know" that the object vtable is the vtable
for A i.e. that this object is an A:

  $ echo "_ZTV1A" | c++filt
  vtable for A

The cluster dump is simplified if the value is for the whole object,
which is happening here, so the dump might be clearer with an example
that adds some member data in the ctor.  For that case, in the state
after the ctor returns I expect you will see the cluster has a ptr to
the vtable, then the member data.

So in an interprocedural example where a ctor runs, the state should
capture what actual subclass the instance is after the ctor has
returned.

I wonder if it's possible to make use of that knowledge when handling a
virtual function call.

An example might be:

struct A
{
  A (int data) : m_data (data) {}
  virtual char foo () const { return 'A'; }
  int m_data;
};
class B : public A
{

  B (int data) : A (data) {}
  virtual char foo () const { return 'B'; }
  int m_data;
}

A *make_a (int v)  __atttribute__(noinline)) { return new A(v); }
B *make_b (int v)  __atttribute__(noinline)) { return new B(v); }

void test_base (int v)
{
  A *base = make_a (v);
  __analyzer_eval (base->foo () == 'A'); // should be TRUE
  delete base;
}

void test_sub (int v)
{
  A *sub = make_b (v);
  __analyzer_eval (sub->foo () == 'B'); // should be TRUE
  delete sub;
}

where the "noinline" factory functions should hide the specific
subclasses from the optimizer, so the analyzer has to rely on the
vtables bound to the clusters in the store, but in theory can tell that
"base" is specifically an A, and that "sub" is specifically a B, and
hence "know" exactly what is called at each foo vfunc callsite.

(caveat: untested code)

Hope the above makes sense.

Dave

Re: daily report on extending static analyzer project [GSoC]

Reply via email to