> On 06-Aug-2021, at 4:39 AM, David Malcolm <dmalc...@redhat.com> wrote:
> 
> On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote:
>> 
>> 
>>> On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalc...@redhat.com>
>>> wrote:
>>> 
>>> On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
>>> 
>>> [...snip...]
>>>> 
>>>> - From observation, a typical vfunc call that isn't devirtualised
>>>> by
>>>> the compiler's front end looks something like this 
>>>> "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
>>>> where "a_ptr_5(D)" is pointer that is being used to call the
>>>> virtual
>>>> function.
>>>> 
>>>> - We can access it's region to see what is the type of the object
>>>> the
>>>> pointer is actually pointing to.
>>>> 
>>>> - This is then used to find a call with DECL_CONTEXT of the object
>>>> from the all the possible targets of that polymorphic call.
>>> 
>>> [...]
>>> 
>>>> 
>>>> Patch file ( prototype ) : 
>>>> 
>>> 
>>>> +  /* Call is possibly a polymorphic call.
>>>> +  
>>>> +     In such case, use devirtisation tools to find 
>>>> +     possible callees of this function call.  */
>>>> +  
>>>> +  function *fun = get_current_function ();
>>>> +  gcall *stmt  = const_cast<gcall *> (call);
>>>> +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>>>> +  if (e->indirect_info->polymorphic)
>>>> +  {
>>>> +    void *cache_token;
>>>> +    bool final;
>>>> +    vec <cgraph_node *> targets
>>>> +      = possible_polymorphic_call_targets (e, &final,
>>>> &cache_token, true);
>>>> +    if (!targets.is_empty ())
>>>> +      {
>>>> +        tree most_propbable_taget = NULL_TREE;
>>>> +        if(targets.length () == 1)
>>>> +           return targets[0]->decl;
>>>> +    
>>>> +        /* From the current state, check which subclass the
>>>> pointer that 
>>>> +           is being used to this polymorphic call points to, and
>>>> use to
>>>> +           filter out correct function call.  */
>>>> +        tree t_val = gimple_call_arg (call, 0);
>>> 
>>> Maybe rename to "this_expr"?
>>> 
>>> 
>>>> +        const svalue *sval = get_rvalue (t_val, ctxt);
>>> 
>>> and "this_sval"?
>> 
>> ok
>> 
>>> 
>>> ...assuming that that's what the value is.
>>> 
>>> Probably should reject the case where there are zero arguments.
>> 
>> Ideally it should always have one argument representing the pointer
>> used to call the function. 
>> 
>> for example, if the function is called like this : -
>> 
>> a_ptr->foo(arg);  // where foo() is a virtual function and a_ptr is a
>> pointer to an object of a subclass.
>> 
>> I saw that it’s GIMPLE representation is as follows : -
>> 
>> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);
>> 
>>> 
>>> 
>>>> +
>>>> +        const region *reg
>>>> +          = [&]()->const region *
>>>> +              {
>>>> +                switch (sval->get_kind ())
>>>> +                  {
>>>> +                    case SK_INITIAL:
>>>> +                      {
>>>> +                        const initial_svalue *initial_sval
>>>> +                          = sval->dyn_cast_initial_svalue ();
>>>> +                        return initial_sval->get_region ();
>>>> +                      }
>>>> +                      break;
>>>> +                    case SK_REGION:
>>>> +                      {
>>>> +                        const region_svalue *region_sval 
>>>> +                          = sval->dyn_cast_region_svalue ();
>>>> +                        return region_sval->get_pointee ();
>>>> +                      }
>>>> +                      break;
>>>> +
>>>> +                    default:
>>>> +                      return NULL;
>>>> +                  }
>>>> +              } ();
>>> 
>>> I think the above should probably be a subroutine.
>>> 
>>> That said, it's not clear to me what it's doing, or that this is
>>> correct.
>> 
>> 
>> Sorry, I think I should have explained it earlier.
>> 
>> Let's take an example code snippet :- 
>> 
>> Derived d;
>> Base *base_ptr;
>> base_ptr = &d;
>> base_ptr->foo();        // where foo() is a virtual function
>> 
>> This genertes the following GIMPLE dump :- 
>> 
>> Derived::Derived (&d);
>> base_ptr_6 = &d.D.3779;
>> _1 = base_ptr_6->_vptr.Base;
>> _2 = _1 + 8;
>> _3 = *_2;
>> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);
> 
> I did a bit of playing with this example, and tried adding:
> 
> 1876      case OBJ_TYPE_REF:
> 1877        gcc_unreachable ();
> 1878        break;
> 
> to region_model::get_rvalue_1, and running cc1plus under the debugger.
> 
> The debugger hits the "gcc_unreachable ();", at this stmt:
> 
>     OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5);
> 
> Looking at the region_model with region_model::debug() shows:
> 
> (gdb) call debug()
> stack depth: 1
>  frame (index 0): frame: ‘test’@1
> clusters within frame: ‘test’@1
>  cluster for: Derived d
>    key:   {bytes 0-7}
>    value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived 
> [3])(...)+(sizetype)16)}
>  cluster for: base_ptr_5: &Derived d.<anonymous>
>  cluster for: _2: &‘foo’
> m_called_unknown_fn: FALSE
> constraint_manager:
>  equiv classes:
>    ec0: {&Derived d.<anonymous>}
>    ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)}
>    ec2: {(void *)0B == [m_constant]‘0B’}
>    ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
>  constraints:
>    0: ec0: {&Derived d.<anonymous>} != ec2: {(void *)0B == [m_constant]‘0B’}
>    1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} != ec2: {(void 
> *)0B == [m_constant]‘0B’}
>    2: ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} 
> != ec2: {(void *)0B == [m_constant]‘0B’}
> 
> i.e. it already "knows" that _2 is &'foo' for Derived::foo.
> 
> So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the
> function pointer directly from the vtable for such cases, so something
> like:
> 
>    case OBJ_TYPE_REF:
>       {
>          tree expr = OBJ_TYPE_REF_EXPR (pv.m_tree);
>          return get_rvalue (expr, ctxt); 
>       }
>       break;
> 
> might get the function pointer.

I tried it, and yes, it works like a charm. Thanks : )

> 
> (caveat: untested code)
> 
>> 
>> Here instead of trying to extract virtual pointer from the call and see
>> which subclass it belongs, I found it simpler to extract the actual
>> pointer which is used to call the function itself (which from
>> observation, is always the first parameter of the call) and used the
>> region model at that point to figure out what is the type of the object
>> it actually points to ultimately get the actual subclass who's function
>> is being called here. :)
>> 
>> Now let me try to explain how I actually executed it ( A lot of
>> assumptions here are based on observation, so please correct me
>> wherever you think I made a false interpretation or forgot about a
>> certain special case ) :
>> 
>> - once it is confirmed that the call that we are dealing with is a
>> polymorphic call ( via the cgraph edge representing the call ), I used
>> the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined
>> in ipa-devirt.c ), to get the possible callee of that call. 
>> 
>>   function *fun = get_current_function ();
>>   gcall *stmt  = const_cast<gcall *> (call);
>>   cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>>   if (e->indirect_info->polymorphic)
>>   {
>>     void *cache_token;
>>     bool final;
>>     vec <cgraph_node *> targets
>>       = possible_polymorphic_call_targets (e, &final, &cache_token,
>> true);
>> 
>> - Now if the list contains more than one targets, I will make use of
>> the current enode's region model to get more info about the pointer
>> which was used to call the function .
>> 
>>         /* here I extract the pointer (which was used to call the
>> function), which from observation, is always the zeroth argument of the
>> call.  */
>>         tree t_val = gimple_call_arg (call, 0);
>>         const svalue *sval = get_rvalue (t_val, ctxt);
>> 
>> - In all the examples I used, the pointer is represented as
>> region_svalue or as initial_svalue (I think, initial_svalue is the case
>> where the pointer is taken as a parameter of the current function and
>> analyzer is analysing top-level call to this function )
>> 
>> Here are some examples of the following, Where I used
>> __analyzer_describe () to show the same 
>>  . (https://godbolt.org/z/Mqs8oM6ff)
>>  . (https://godbolt.org/z/z4sfTM3f5))
>> 
>>         /* here I extract the region that the pointer is pointing to,
>> and as both of them returns a (const region *), I used a lambda to get
>> it ( If you want, I can turn this into a separate function to make it
>> more readable )  */
>> 
>>         const region *reg
>>           = [&]()->const region *
>>               {
>>                 switch (sval->get_kind ())
>>                   {
>>                     case SK_INITIAL:
>>                       {
>>                         const initial_svalue *initial_sval
>>                           = sval->dyn_cast_initial_svalue ();
>>                         return initial_sval->get_region ();
>>                       }
>>                       break;
>>                     case SK_REGION:
>>                       {
>>                         const region_svalue *region_sval 
>>                           = sval->dyn_cast_region_svalue ();
>>                         return region_sval->get_pointee ();
>>                       }
>>                       break;
>> 
>>                     default:
>>                       return NULL;
>>                   }
>>               } ();
>> 
>>         gcc_assert (reg);
>> 
>>         /* Now that I have the region, I tried to get the type of the
>> object it is holding and put it in ‘known_possible_subclass_type’.  */
>> 
>>         tree known_possible_subclass_type;
>>         known_possible_subclass_type = reg->get_type ();
>>         if (reg->get_kind () == RK_FIELD)
>>           {
>>              const field_region* field_reg = reg->dyn_cast_field_region
>> ();
>>              known_possible_subclass_type 
>>                = DECL_CONTEXT (field_reg->get_field ());
>>           }
>> 
>> /* After that I iterated over the entire array of possible calls to
>> find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same
>> as that of the type of the object that the pointer is actually pointing
>> to.  */
>> 
>>         for (cgraph_node *x : targets)
>>           {
>>             if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
>>               most_propbable_taget = x->decl;
>>           }
>>         return most_propbable_taget;
>>       }
>>    }
>> 
>> I tested it on all of the test programs I created and till now in all
>> of the cases, the analyzer is correctly determining the call. I am
>> currently in the process of creating more tests ( including multiple
>> types of inheritances ) to see how successful is this implementation .
> 
> I'm still skeptical of the above code; my feeling is that with more
> tests you'll find cases where it doesn't work.  Maybe dynamically
> allocated instances?

That’s what I was thinking, and that’s why I wanted it to test on more 
programs, but looks like I don’t have need this anymore.

> 
> Hope this is constructive
> 
> Dave
> 

Thanks 
- Ankur

Reply via email to