Am Dienstag, dem 24.10.2023 um 22:51 +0000 schrieb Qing Zhao:
> 
> > On Oct 24, 2023, at 4:38 PM, Martin Uecker <uec...@tugraz.at> wrote:
> > 
> > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> > > Hi, Sid,
> > > 
> > > Really appreciate for your example and detailed explanation. Very helpful.
> > > I think that this example is an excellent example to show (almost) all 
> > > the issues we need to consider.
> > > 
> > > I slightly modified this example to make it to be compilable and 
> > > run-able, as following: 
> > > (but I still cannot make the incorrect reordering or DSE happening, 
> > > anyway, the potential reordering possibility is there…)
> > > 
> > >  1 #include <malloc.h>
> > >  2 struct A
> > >  3 {
> > >  4  size_t size;
> > >  5  char buf[] __attribute__((counted_by(size)));
> > >  6 };
> > >  7 
> > >  8 static size_t
> > >  9 get_size_from (void *ptr)
> > > 10 {
> > > 11  return __builtin_dynamic_object_size (ptr, 1);
> > > 12 }
> > > 13 
> > > 14 void
> > > 15 foo (size_t sz)
> > > 16 {
> > > 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * 
> > > sizeof(char));
> > > 18  obj->size = sz;
> > > 19  obj->buf[0] = 2;
> > > 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
> > > 21  return;
> > > 22 }
> > > 23 
> > > 24 int main ()
> > > 25 {
> > > 26  foo (20);
> > > 27  return 0;
> > > 28 }
> > > 
> > > With my GCC, it was compiled and worked:
> > > [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > 20
> > > Situation 1: With O1 and above, the routine “get_size_from” was inlined 
> > > into “foo”, therefore, the call to __bdos is in the same routine as the 
> > > instantiation of the object, and the TYPE information and the attached 
> > > counted_by attribute information in the TYPE of the object can be USED by 
> > > the __bdos call to compute the final object size. 
> > > 
> > > [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> > > [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> > > -1
> > > Situation 2: With O0, the routine “get_size_from” was NOT inlined into 
> > > “foo”, therefore, the call to __bdos is Not in the same routine as the 
> > > instantiation of the object, As a result, the TYPE info and the attached 
> > > counted_by info of the object can NOT be USED by the __bdos call. 
> > > 
> > > Keep in mind of the above 2 situations, we will refer them in below:
> > > 
> > > 1. First,  the problem we are trying to resolve is:
> > > 
> > > (Your description):
> > > 
> > > > the reordering of __bdos w.r.t. initialization of the size parameter 
> > > > but to also account for DSE of the assignment, we can abstract this 
> > > > problem to that of DFA being unable to see implicit use of the size 
> > > > parameter in the __bdos call.
> > > 
> > > basically is correct.  However, with the following exception:
> > > 
> > > The implicit use of the size parameter in the __bdos call is not always 
> > > there, it ONLY exists WHEN the __bdos is able to evaluated to an 
> > > expression of the size parameter in the “objsz” phase, i.e., the 
> > > “Situation 1” of the above example. 
> > > In the “Situation 2”, when the __bdos does not see the TYPE of the real 
> > > object,  it does not see the counted_by information from the TYPE, 
> > > therefore,  it is not able to evaluate the size of the object through the 
> > > counted_by information.  As a result, the implicit use of the size 
> > > parameter in the __bdos call does NOT exist at all.  The optimizer can 
> > > freely reorder the initialization of the size parameter with the __bdos 
> > > call since there is no data flow dependency between these two. 
> > > 
> > > With this exception in mind, we can see that your proposed “option 2” 
> > > (making the type of size “volatile”) is too conservative, it will  
> > > disable many optimizations  unnecessarily, even though it’s safe and 
> > > simple to implement. 
> > > 
> > > As a compiler optimization person for many many years, I really don’t 
> > > want to take this approach at this moment.  -:)
> > > 
> > > 2. Some facts I’d like to mention:
> > > 
> > > A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
> > > optimization stage. During RTL stage,  the __bdos call has already been 
> > > replaced by an expression of the size parameter or a constant, the data 
> > > dependency is explicitly in the IR already.  I believe that the data 
> > > analysis in RTL stage should pick up the data dependency correctly, No 
> > > special handling is needed in RTL.
> > > 
> > > B. If the __bdos call cannot see the real object , it has no way to get 
> > > the “counted_by” field from the TYPE of the real object. So, if we try to 
> > > add the implicit use of the “counted_by” field to the __bdos call, the 
> > > object instantiation should be in the same routine as the __bdos call.  
> > > Both the FE and the gimplification phase are too early to do this work. 
> > > 
> > > 2. Then, what’s the best approach to resolve this problem:
> > > 
> > > There were several suggestions so far:
> > > 
> > > A.  Add an additional argument, the size parameter,  to __bdos, 
> > >      A.1, during FE;
> > >      A.2, during gimplification phase;
> > > B.  Encode the implicit USE  in the type of size, to make the size 
> > > “volatile”;
> > > C.  Encode the implicit USE  in the type of buf, then update the 
> > > optimization passes to use this implicit USE encoded in the type of buf.
> > > 
> > > As I explained in the above, 
> > > ** Approach A (both A.1 and A.2) does not work;
> > > ** Approach B will have big performance impact, I’d prefer not to take 
> > > this approach at this moment.
> > > ** Approach C will be a lot of change in GCC, and also not very necessary 
> > > since the ONLY implicit use of the size parameter is in the __bdos call 
> > > when __bdos can see the real object.
> > > 
> > > So, all the above proposed approaches, A, B, C, are not very good. 
> > > 
> > > Then, maybe the following might work better?
> > > 
> > > In the tree optimization stage, 
> > >    * After the inlining transformation applied,  
> > > +  * Before the data-flow related optimization happens, 
> > > +  * when the data flow analysis is constructed, 
> > > 
> > > For each call to __bdos, add the implicit use of size parameter. 
> > > 
> > > Is this doable? 
> > 
> > Here is another proposal:  Add a new builtin function
> > 
> > __builtin_with_size(x, size)
> > 
> > that return x but behaves similar to an allocation
> > function in that BDOS can look at the size argument
> > to discover the size.
> > 
> > The FE insers this function when the field is accessed:
> > 
> > __builtin_with_size(x.buf, x.L);
> 
> Thanks for the proposal!
> 
> So what you suggested is:
> 
> For every x.buf,  change it as a __builtin_with_size(x.buf, x.L) in the FE, 
> then the call to the _bdos (x.buf, 1) will
> Become:
> 
>    _bdos(__builtin_with_size(x.buf, x.L), 1)?
> 
> Then the implicit use of x.L in _bdos(x.buf.1) will become explicit?
> 
> This looks like a very promising solution.
> 
> Will study this a. Little bit more.

Yes, the load will be created explicitely in the FE
at the right position.   The BDOS pass can then
later propagate the size to where it is used:

x = &__builtin_with_size(x.buf, x.L)

...other stuff is happening...

__bdos(x, 1).

See for a working example: https://godbolt.org/z/Ej3s1GToa
which shows that reordering is still possible.


I think this should be easy to implement
because it is similar to how BDOS works with
builtin allocation functions.

And this feature seems generally useful. 

I am not sure whether the builtin to take a pointer
argument or the object itself.

Note that the builtin does nothing. It just ensure
that the size argument is evaluated at the right
point in time.

Maybe it should have arguments for min / max 
subobject etc..   Not sure.


Martin

> 
> Qing
> > 
> > 
> > Martin
> > 
> > 
> > 
> > > 
> > > Otherwise, we might need to take the “volatile” approach. 
> > > 
> > > Let me know your suggestion and comment.
> > > 
> > > Thanks a lot.
> > > 
> > > Qing
> > > 
> > > 
> > > > __bdos is the one such implicit user of the size parameter and you're 
> > > > proposing to solve this by encoding the relationship between buffer and 
> > > > size at the __bdos call site.  But what about the case when the 
> > > > instantiation of the object is not at the same place as the __bdos call 
> > > > site, i.e. the DFA is unable to make that relationship?
> > > > 
> > > > The example Martin showed where the subobject gets "hidden" behind a 
> > > > pointer was a trivial one where DFA *may* actually work in practice 
> > > > (because the object-size pass can thread through these assignments) but 
> > > > think about this one:
> > > > 
> > > > struct A
> > > > {
> > > > size_t size;
> > > > char buf[] __attribute__((counted_by(size)));
> > > > }
> > > > 
> > > > static size_t
> > > > get_size_of (void *ptr)
> > > > {
> > > > return __bdos (ptr, 1);
> > > > }
> > > > 
> > > > void
> > > > foo (size_t sz)
> > > > {
> > > > struct A *obj = __builtin_malloc (sz);
> > > > obj->size = sz;
> > > > 
> > > > ...
> > > > __builtin_printf ("%zu\n", get_size_of (obj->array));
> > > > ...
> > > > }
> > > > 
> > > > Until get_size_of is inlined, no DFA can see the __bdos call in the 
> > > > same place as the point where obj is allocated.  As a result, the 
> > > > assignment to obj->size could get reordered (or the store eliminated) 
> > > > w.r.t. the __bdos call until the inlining happens.
> > > > 
> > > > As a result, the relationship between buf and size established by the 
> > > > attribute needs to be encoded into the type somehow.  There are two 
> > > > options:
> > > > 
> > > > Option 1: Encode the relationship in the type of buf
> > > > 
> > > > This is kinda what you end up doing with component_ref_has_counted_by 
> > > > and it does show the relationship if one is looking (through that 
> > > > call), but nothing more that can be used to, e.g. prevent reordering or 
> > > > tell the optimizer that the reference to the buf member may imply a 
> > > > reference to the size member as well.  This could be remedied by 
> > > > somehow encoding the USES relationship for size into the type of buf 
> > > > that the optimization passes can see.  I feel like this may be a bit 
> > > > convoluted to specify in a future language extension in a way that will 
> > > > actually be well understood by developers, but it will likely generate 
> > > > faster runtime code.  This will also likely require a bigger change 
> > > > across passes.
> > > > 
> > > > Option 2: Encode the relationship in the type of size
> > > > 
> > > > The other option is to enhance the type of size somehow so that it 
> > > > discourages reordering and store elimination, basically pessimizing 
> > > > code.  I think volatile semantics might be the way to do this and may 
> > > > even be straightforward to specify in the future language extension 
> > > > given that it builds on a known language construct and is thematically 
> > > > related.  However it does pessimize output for code that implements 
> > > > __counted_by__.
> > > > 
> > > > Thanks,
> > > > Sid
> 

Reply via email to