On Wed, Jul 24, 2019 at 1:41 AM Akshat Garg <xks...@gmail.com> wrote:

> Hi all,
>
> I have tried to make the dependent_ptr qualification act as volatile
> during the RTL passes to bypass the RTL optimizations for now. Here is the
> patch
> https://github.com/AKG001/gcc/commit/14c05ae546554f822f667fdb72080b7fe52fea32
> For this (https://github.com/AKG001/test/blob/master/dependency_breaking.c)
> example, I have been able to get this (
> https://github.com/AKG001/test/blob/master/dependency_breaking.c.317r.final)
> .final code. I believe that there are no dependencies that are broken.
> Hoping someone else could also confirm if I have mistaken somewhere. I have
> given the dependent_ptr qualification to only memory type objects for now.
> The flag /d represents that this memory represents the dependent pointer in
> .final code.
> I am checking up the other examples form document P0190R4 to see what
> other things need to be done. Let me know what you guys think.
>
Sorry, I forgot to give the .optimized code for the above example. Here, it
is
https://github.com/AKG001/test/blob/master/dependency_breaking.c.231t.optimize
<https://github.com/AKG001/test/blob/master/dependency_breaking.c.231t.optimized>
May be needed.

>
> -Akshat
>
> On Mon, Jul 22, 2019 at 3:16 PM Richard Biener <richard.guent...@gmail.com>
> wrote:
>
>> On Mon, Jul 22, 2019 at 10:54 AM Akshat Garg <xks...@gmail.com> wrote:
>>
>>> On Mon, Jul 22, 2019 at 2:11 PM Richard Biener <
>>> richard.guent...@gmail.com> wrote:
>>>
>>>> On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg <xks...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>> Consider part of an example(figure 20) from doc P0190R4(
>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf)
>>>>> shown below:
>>>>>
>>>>> 1.  void thread1 (void)
>>>>> 2.  {
>>>>> 3.    int * volatile p;
>>>>> 4.    p = rcu_dereference(gip);
>>>>> 5.    if (p)
>>>>> 6.    assert(*(p+p[0]) == 42);
>>>>> 7.  }
>>>>> The .gimple code produced is :
>>>>>
>>>>> 1.  thread1 ()
>>>>> 2.  {
>>>>> 3.   atomic int * D.1992;
>>>>> 4.    int * volatile p;
>>>>> 5.  {
>>>>> 6.    atomic int * * __atomic_load_ptr;
>>>>> 7.   atomic int * __atomic_load_tmp;
>>>>> 8.    try
>>>>> 9.     {
>>>>> 10.        __atomic_load_ptr = &gip;
>>>>> 11.        _1 = __atomic_load_8 (__atomic_load_ptr, 1);
>>>>> 12.        _2 = (atomic int *) _1;
>>>>> 13.        __atomic_load_tmp = _2;
>>>>> 14.        D.1992 = __atomic_load_tmp;
>>>>> 15.     }
>>>>> 16.    finally
>>>>> 17.      {
>>>>> 18.        __atomic_load_tmp = {CLOBBER};
>>>>> 19.      }
>>>>> 20.  }
>>>>> 21.   p = D.1992;
>>>>> 22.   p.2_3 = p;
>>>>> 23.  if (p.2_3 != 0B) goto <D.1994>; else goto <D.1995>;
>>>>> 24.  <D.1994>:
>>>>> 25.   p.3_4 = p;
>>>>> 26.  p.4_5 = p;
>>>>> 27.   _6 = *p.4_5;
>>>>> 28.  _7 = (long unsigned int) _6;
>>>>> 29.  _8 = _7 * 4;
>>>>> 30.  _9 = p.3_4 + _8;
>>>>> 31.  _10 = *_9;
>>>>> 32.  _11 = _10 == 42;
>>>>> 33.  _12 = (int) _11;
>>>>> 34.  assert (_12);
>>>>> 35.  <D.1995>:
>>>>> 36. }
>>>>>
>>>>> assert at line 34 in .gimple code still breaks the dependency given by
>>>>> the user. I believe, there should be some ssa defined variable of p or p
>>>>> itself in assert. This is happening when I am considering pointer as
>>>>> volatile qualified. If I consider it as _Dependent_ptr qualified then it
>>>>> surely produces the broken dependency code. Let me know, if I wrong
>>>>> somewhere.
>>>>>
>>>>>
>>>> p appears as memory here which we load its value to p.3_4 which we then
>>>> offset by _8 and load from the
>>>> computed address into _10 which then appears in the assert condition.
>>>> I think that's as good as it can
>>>> get ...
>>>>
>>>> Richard.
>>>>
>>>
>>> Thank you for your reply. For, the same example above, consider this (
>>> https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L402)
>>> instruction at rtl level changed form this (
>>> https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L231)
>>> during the cse pass. The variable p.2_3 gets replaced by a temporary _1 but
>>> _1 is not any dependent pointer where, p.2_3 was. Is this also not breaking
>>> any dependencies
>>>
>>
>> I'm not sure.  In general CSE can break dependences.  If the dependent
>> pointer chain needs to conver multiple levels of
>> indirections from the original atomic operation you need to make sure to
>> not expose atomics as CSEable.  Thus on
>> RTL have them all UNSPECs.
>>
>> Richard.
>>
>>
>>> -Akshat
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg <xks...@gmail.com> wrote:
>>>>>
>>>>>> On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill <ja...@redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney <
>>>>>>> paul...@linux.ibm.com> wrote:
>>>>>>> >
>>>>>>> > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
>>>>>>> > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg <xks...@gmail.com>
>>>>>>> wrote:
>>>>>>> > >
>>>>>>> > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
>>>>>>> > > > ramana....@googlemail.com> wrote:
>>>>>>> > > >
>>>>>>> > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg <
>>>>>>> xks...@gmail.com> wrote:
>>>>>>> > > >> >
>>>>>>> > > >> > As we have some working front-end code for _Dependent_ptr,
>>>>>>> What should
>>>>>>> > > >> we do next? What I understand, we can start adding the
>>>>>>> library for
>>>>>>> > > >> dependent_ptr and its functions for C corresponding to the
>>>>>>> ones we created
>>>>>>> > > >> as C++ template library. Then, after that, we can move on to
>>>>>>> generating the
>>>>>>> > > >> assembly code part.
>>>>>>> > > >> >
>>>>>>> > > >>
>>>>>>> > > >>
>>>>>>> > > >> I think the next step is figuring out how to model the
>>>>>>> Dependent
>>>>>>> > > >> pointer information in the IR and figuring out what
>>>>>>> optimizations to
>>>>>>> > > >> allow or not with that information. At this point , I suspect
>>>>>>> we need
>>>>>>> > > >> a plan on record and have the conversation upstream on the
>>>>>>> lists.
>>>>>>> > > >>
>>>>>>> > > >> I think we need to put down a plan on record.
>>>>>>> > > >>
>>>>>>> > > >> Ramana
>>>>>>> > > >
>>>>>>> > > > [CCing gcc mailing list]
>>>>>>> > > >
>>>>>>> > > > So, shall I start looking over the pointer optimizations only
>>>>>>> and see what
>>>>>>> > > > information we may be needed on the same examples in the IR
>>>>>>> itself?
>>>>>>> > > >
>>>>>>> > > > - Akshat
>>>>>>> > > >
>>>>>>> > > I have coded an example where equality comparison kills
>>>>>>> dependency from the
>>>>>>> > > document P0190R4 as shown below :
>>>>>>> > >
>>>>>>> > > 1. struct rcutest rt = {1, 2, 3};
>>>>>>> > > 2. void thread0 ()
>>>>>>> > > 3. {
>>>>>>> > > 4.     rt.a = -42;
>>>>>>> > > 5.     rt.b = -43;
>>>>>>> > > 6.     rt.c = -44;
>>>>>>> > > 7.     rcu_assign_pointer(gp, &rt);
>>>>>>> > > 8. }
>>>>>>> > > 9.
>>>>>>> > > 10. void thread1 ()
>>>>>>> > > 11. {
>>>>>>> > > 12.    int i = -1;
>>>>>>> > > 13.    int j = -1;
>>>>>>> > > 14.    _Dependent_ptr struct rcutest *p;
>>>>>>> > > 15.
>>>>>>> > > 16.    p = rcu_dereference(gp);
>>>>>>> > > 17.    j = p->a;
>>>>>>> > > 18.   if (p == &rt)
>>>>>>> > > 19.        i = p->b;  /*Dependency breaking point*/
>>>>>>> > > 20.   else if(p)
>>>>>>> > > 21.       i = p->c;
>>>>>>> > > 22.   assert(i<0);
>>>>>>> > > 23.   assert(j<0);
>>>>>>> > > 24. }
>>>>>>> > > The gimple unoptimized code produced for lines 17-24 is shown
>>>>>>> below
>>>>>>> > >
>>>>>>> > > 1. if (p_16 == &rt)
>>>>>>> > > 2.     goto <bb 3>; [INV]
>>>>>>> > > 3.   else
>>>>>>> > > 4.    goto <bb 4>; [INV]
>>>>>>> > > 5.
>>>>>>> > > 6.  <bb 3> :
>>>>>>> > > 7.  i_19 = p_16->b;
>>>>>>> > > 8.  goto <bb 6>; [INV]
>>>>>>> > > 9.
>>>>>>> > > 10.  <bb 4> :
>>>>>>> > > 11.  if (p_16 != 0B)
>>>>>>> > > 12.    goto <bb 5>; [INV]
>>>>>>> > > 13.  else
>>>>>>> > > 14.    goto <bb 6>; [INV]
>>>>>>> > > 15.
>>>>>>> > > 16.  <bb 5> :
>>>>>>> > > 17.  i_18 = p_16->c;
>>>>>>> > > 18.
>>>>>>> > > 19.  <bb 6> :
>>>>>>> > > 20.  # i_7 = PHI <i_19(3), i_8(4), i_18(5)>
>>>>>>> > > 21.  _3 = i_7 < 0;
>>>>>>> > > 22.  _4 = (int) _3;
>>>>>>> > > 23.  assert (_4);
>>>>>>> > > 24.  _5 = j_17 < 0;
>>>>>>> > > 25.  _6 = (int) _5;
>>>>>>> > > 26.  assert (_6);
>>>>>>> > > 27.  return;
>>>>>>> > >
>>>>>>> > > The optimized code after -O1 is applied for the same lines is
>>>>>>> hown below :
>>>>>>> > >
>>>>>>> > > 1. if (_2 == &rt)
>>>>>>> > > 2.    goto <bb 3>; [30.00%]
>>>>>>> > > 3. else
>>>>>>> > > 4.    goto <bb 4>; [70.00%]
>>>>>>> > > 5.
>>>>>>> > > 6.  <bb 3> [local count: 322122547]:
>>>>>>> > > 7.   i_12 = rt.b;
>>>>>>> > > 8.   goto <bb 6>; [100.00%]
>>>>>>> > > 9.
>>>>>>> > > 10.  <bb 4> [local count: 751619277]:
>>>>>>> > > 11.   if (_1 != 0)
>>>>>>> > > 12.   goto <bb 5>; [50.00%]
>>>>>>> > > 13.   else
>>>>>>> > > 14.    goto <bb 6>; [50.00%]
>>>>>>> > > 15.
>>>>>>> > > 16.  <bb 5> [local count: 375809638]:
>>>>>>> > > 17.   i_11 = MEM[(dependent_ptr struct rcutest *)_2].c;
>>>>>>> > > 18.
>>>>>>> > > 19.   <bb 6> [local count: 1073741824]:
>>>>>>> > > 20.  # i_7 = PHI <i_12(3), i_11(5), -1(4)>
>>>>>>> > > 21.   _3 = i_7 < 0;
>>>>>>> > > 22.   _4 = (int) _3;
>>>>>>> > > 23.   assert (_4);
>>>>>>> > > 24.  _5 = j_10 < 0;
>>>>>>> > > 25.  _6 = (int) _5;
>>>>>>> > > 26.   assert (_6);
>>>>>>> > > 27.   return;
>>>>>>> >
>>>>>>> > Good show on tracing this through!
>>>>>>> >
>>>>>>> > > Statement 19 in the program gets converted from  i_19 = p_16->b;
>>>>>>> in line 7
>>>>>>> > > in unoptimized code to i_12 = rt.b; in line 7 in optimized code
>>>>>>> which
>>>>>>> > > breaks the dependency chain. We need to figure out the pass that
>>>>>>> does that
>>>>>>> > > and put some handling code in there for the _dependent_ptr
>>>>>>> qualified
>>>>>>> > > pointers. Passing simply -fipa-pure-const,
>>>>>>> -fguess-branch-probability or
>>>>>>> > > any other option alone does not produce the optimized code that
>>>>>>> breaks the
>>>>>>> > > dependency. But applying -O1, i.e., allowing all the
>>>>>>> optimizations does so.
>>>>>>> > > As passes are applied in a certain order, we need to figure out
>>>>>>> up to what
>>>>>>> > > passes, the code remains same and after what pass the dependency
>>>>>>> does not
>>>>>>> > > holds. So, we need to check the translated code after every pass.
>>>>>>> > >
>>>>>>> > > Does this sounds like a workable plan for ? Let me know your
>>>>>>> thoughts. If
>>>>>>> > > this sounds good then, we can do this for all the optimizations
>>>>>>> that may
>>>>>>> > > kill the dependencies at somepoint.
>>>>>>> >
>>>>>>> > I don't know of a better plan.
>>>>>>> >
>>>>>>> > My usual question...  Is there some way to script the checking of
>>>>>>> the
>>>>>>> > translated code at the end of each pass?
>>>>>>>
>>>>>>> The usual way to check the output of an optimization pass is by
>>>>>>> dumping the intermediate code at that point and matching the dump
>>>>>>> against a regexp, as in the tree-ssa directories in the testsuite.
>>>>>>> -fdump-tree-all will dump after all the gimple optimization passes,
>>>>>>> and you can look through them until you find the breakage.
>>>>>>>
>>>>>>> Jason
>>>>>>>
>>>>>> Thanks Jason for the advice, I followed it.
>>>>>>
>>>>>> I coded an example shown in figure 26 from here (
>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf).
>>>>>> The example:
>>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c
>>>>>> The .objsz1 code:
>>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.027t.objsz1
>>>>>> The .ccp1 code:
>>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.028t.ccp1
>>>>>> The .sra code:
>>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.114t.sra
>>>>>> The .dom2 code:
>>>>>> https://github.com/AKG001/test/blob/master/dependency_breaking.c.116t.dom2
>>>>>>
>>>>>> Comparing the .objsz1 and .ccp1 file, the "struct p" gets removed
>>>>>> after constant copy propagation (ccp) pass(
>>>>>> https://github.com/gcc-mirror/gcc/blob/master/gcc/passes.def#L77).
>>>>>> In .objsz1 file at line 53, the dependencies start with
>>>>>> p_16 = _14;
>>>>>> .......
>>>>>> i_19 = p_16->b;
>>>>>> ......
>>>>>> as user specified and after that all the references are made through
>>>>>> struct p or its variants. But in .ccp1 file at line 41, the "struct p" is
>>>>>> replaced with variable _2, now, variable _2 starts the dependencies as we
>>>>>> can see.
>>>>>> _2 = (atomic struct rcutest *) _1;
>>>>>> ...........
>>>>>> i_19 = MEM[(struct rcutest *)_2].b;
>>>>>> ..........
>>>>>> I don't know whether it is okay to say at this point, the
>>>>>> dependencies are still maintained or not. Or we have to pass the 
>>>>>> dependent
>>>>>> pointer qualification to new variables that replaces them. Hoping someone
>>>>>> could clarify.
>>>>>>
>>>>>> Also, In .sra file at line 43 in thread1(), the statement is:
>>>>>>     i_12 = MEM[(struct rcutest *)_2].b;
>>>>>> In .dom2 file at line 61, the above statement gets converted to:
>>>>>>     i_12 = rt.b;
>>>>>> So, I believe that sure does breaks the dependency specified by the
>>>>>> user. For this, we need to do some modifications in tree-ssa-dom.c file.
>>>>>> Hope this makes sense.
>>>>>>
>>>>>> Thanks,
>>>>>> Akshat
>>>>>>
>>>>>

Reply via email to