Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao:
> Hi, Sid,
> 
> Really appreciate for your example and detailed explanation. Very helpful.
> I think that this example is an excellent example to show (almost) all the 
> issues we need to consider.
> 
> I slightly modified this example to make it to be compilable and run-able, as 
> following: 
> (but I still cannot make the incorrect reordering or DSE happening, anyway, 
> the potential reordering possibility is there…)
> 
>   1 #include <malloc.h>
>   2 struct A
>   3 {
>   4  size_t size;
>   5  char buf[] __attribute__((counted_by(size)));
>   6 };
>   7 
>   8 static size_t
>   9 get_size_from (void *ptr)
>  10 {
>  11  return __builtin_dynamic_object_size (ptr, 1);
>  12 }
>  13 
>  14 void
>  15 foo (size_t sz)
>  16 {
>  17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>  18  obj->size = sz;
>  19  obj->buf[0] = 2;
>  20  __builtin_printf (“%d\n", get_size_from (obj->buf));
>  21  return;
>  22 }
>  23 
>  24 int main ()
>  25 {
>  26  foo (20);
>  27  return 0;
>  28 }
> 
> With my GCC, it was compiled and worked:
> [opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> 20
> Situation 1: With O1 and above, the routine “get_size_from” was inlined into 
> “foo”, therefore, the call to __bdos is in the same routine as the 
> instantiation of the object, and the TYPE information and the attached 
> counted_by attribute information in the TYPE of the object can be USED by the 
> __bdos call to compute the final object size. 
> 
> [opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
> [opc@qinzhao-ol8u3-x86 ]$ ./a.out
> -1
> Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
> therefore, the call to __bdos is Not in the same routine as the instantiation 
> of the object, As a result, the TYPE info and the attached counted_by info of 
> the object can NOT be USED by the __bdos call. 
> 
> Keep in mind of the above 2 situations, we will refer them in below:
> 
> 1. First,  the problem we are trying to resolve is:
> 
> (Your description):
> 
> >  the reordering of __bdos w.r.t. initialization of the size parameter but 
> > to also account for DSE of the assignment, we can abstract this problem to 
> > that of DFA being unable to see implicit use of the size parameter in the 
> > __bdos call.
> 
> basically is correct.  However, with the following exception:
> 
> The implicit use of the size parameter in the __bdos call is not always 
> there, it ONLY exists WHEN the __bdos is able to evaluated to an expression 
> of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the 
> above example. 
>  In the “Situation 2”, when the __bdos does not see the TYPE of the real 
> object,  it does not see the counted_by information from the TYPE, therefore, 
>  it is not able to evaluate the size of the object through the counted_by 
> information.  As a result, the implicit use of the size parameter in the 
> __bdos call does NOT exist at all.  The optimizer can freely reorder the 
> initialization of the size parameter with the __bdos call since there is no 
> data flow dependency between these two. 
> 
> With this exception in mind, we can see that your proposed “option 2” (making 
> the type of size “volatile”) is too conservative, it will  disable many 
> optimizations  unnecessarily, even though it’s safe and simple to implement. 
> 
> As a compiler optimization person for many many years, I really don’t want to 
> take this approach at this moment.  -:)
> 
> 2. Some facts I’d like to mention:
> 
> A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
> optimization stage. During RTL stage,  the __bdos call has already been 
> replaced by an expression of the size parameter or a constant, the data 
> dependency is explicitly in the IR already.  I believe that the data analysis 
> in RTL stage should pick up the data dependency correctly, No special 
> handling is needed in RTL.
> 
> B. If the __bdos call cannot see the real object , it has no way to get the 
> “counted_by” field from the TYPE of the real object. So, if we try to add the 
> implicit use of the “counted_by” field to the __bdos call, the object 
> instantiation should be in the same routine as the __bdos call.  Both the FE 
> and the gimplification phase are too early to do this work. 
> 
> 2. Then, what’s the best approach to resolve this problem:
> 
> There were several suggestions so far:
> 
> A.  Add an additional argument, the size parameter,  to __bdos, 
>       A.1, during FE;
>       A.2, during gimplification phase;
> B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
> C.  Encode the implicit USE  in the type of buf, then update the optimization 
> passes to use this implicit USE encoded in the type of buf.
> 
> As I explained in the above, 
> ** Approach A (both A.1 and A.2) does not work;
> ** Approach B will have big performance impact, I’d prefer not to take this 
> approach at this moment.
> ** Approach C will be a lot of change in GCC, and also not very necessary 
> since the ONLY implicit use of the size parameter is in the __bdos call when 
> __bdos can see the real object.
> 
> So, all the above proposed approaches, A, B, C, are not very good. 
> 
> Then, maybe the following might work better?
> 
> In the tree optimization stage, 
>     * After the inlining transformation applied,  
> +  * Before the data-flow related optimization happens, 
> +  * when the data flow analysis is constructed, 
> 
> For each call to __bdos, add the implicit use of size parameter. 
> 
> Is this doable? 

Here is another proposal:  Add a new builtin function

__builtin_with_size(x, size)

that return x but behaves similar to an allocation
function in that BDOS can look at the size argument
to discover the size.

The FE insers this function when the field is accessed:

__builtin_with_size(x.buf, x.L);


Martin



> 
> Otherwise, we might need to take the “volatile” approach. 
> 
> Let me know your suggestion and comment.
> 
> Thanks a lot.
> 
> Qing
> 
> 
> >  __bdos is the one such implicit user of the size parameter and you're 
> > proposing to solve this by encoding the relationship between buffer and 
> > size at the __bdos call site.  But what about the case when the 
> > instantiation of the object is not at the same place as the __bdos call 
> > site, i.e. the DFA is unable to make that relationship?
> > 
> > The example Martin showed where the subobject gets "hidden" behind a 
> > pointer was a trivial one where DFA *may* actually work in practice 
> > (because the object-size pass can thread through these assignments) but 
> > think about this one:
> > 
> > struct A
> > {
> >  size_t size;
> >  char buf[] __attribute__((counted_by(size)));
> > }
> > 
> > static size_t
> > get_size_of (void *ptr)
> > {
> >  return __bdos (ptr, 1);
> > }
> > 
> > void
> > foo (size_t sz)
> > {
> >  struct A *obj = __builtin_malloc (sz);
> >  obj->size = sz;
> > 
> >  ...
> >  __builtin_printf ("%zu\n", get_size_of (obj->array));
> >  ...
> > }
> > 
> > Until get_size_of is inlined, no DFA can see the __bdos call in the same 
> > place as the point where obj is allocated.  As a result, the assignment to 
> > obj->size could get reordered (or the store eliminated) w.r.t. the __bdos 
> > call until the inlining happens.
> > 
> > As a result, the relationship between buf and size established by the 
> > attribute needs to be encoded into the type somehow.  There are two options:
> > 
> > Option 1: Encode the relationship in the type of buf
> > 
> > This is kinda what you end up doing with component_ref_has_counted_by and 
> > it does show the relationship if one is looking (through that call), but 
> > nothing more that can be used to, e.g. prevent reordering or tell the 
> > optimizer that the reference to the buf member may imply a reference to the 
> > size member as well.  This could be remedied by somehow encoding the USES 
> > relationship for size into the type of buf that the optimization passes can 
> > see.  I feel like this may be a bit convoluted to specify in a future 
> > language extension in a way that will actually be well understood by 
> > developers, but it will likely generate faster runtime code.  This will 
> > also likely require a bigger change across passes.
> > 
> > Option 2: Encode the relationship in the type of size
> > 
> > The other option is to enhance the type of size somehow so that it 
> > discourages reordering and store elimination, basically pessimizing code.  
> > I think volatile semantics might be the way to do this and may even be 
> > straightforward to specify in the future language extension given that it 
> > builds on a known language construct and is thematically related.  However 
> > it does pessimize output for code that implements __counted_by__.
> > 
> > Thanks,
> > Sid
> 

Reply via email to