Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

Qing Zhao Tue, 24 Oct 2023 13:30:52 -0700

Hi, Sid,

Really appreciate for your example and detailed explanation. Very helpful.
I think that this example is an excellent example to show (almost) all the 
issues we need to consider.


I slightly modified this example to make it to be compilable and run-able, as 
following: 
(but I still cannot make the incorrect reordering or DSE happening, anyway, the 
potential reordering possibility is there…)

  1 #include <malloc.h>
  2 struct A
  3 {
  4  size_t size;
  5  char buf[] __attribute__((counted_by(size)));
  6 };
  7 
  8 static size_t
  9 get_size_from (void *ptr)
 10 {
 11  return __builtin_dynamic_object_size (ptr, 1);
 12 }
 13 
 14 void
 15 foo (size_t sz)
 16 {
 17  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 18  obj->size = sz;
 19  obj->buf[0] = 2;
 20  __builtin_printf (“%d\n", get_size_from (obj->buf));
 21  return;
 22 }
 23 
 24 int main ()
 25 {
 26  foo (20);
 27  return 0;
 28 }

With my GCC, it was compiled and worked:
[opc@qinzhao-ol8u3-x86 ]$  /home/opc/Install/latest-d/bin/gcc -O1 t5.c
[opc@qinzhao-ol8u3-x86 ]$ ./a.out
20
Situation 1: With O1 and above, the routine “get_size_from” was inlined into 
“foo”, therefore, the call to __bdos is in the same routine as the 
instantiation of the object, and the TYPE information and the attached 
counted_by attribute information in the TYPE of the object can be USED by the 
__bdos call to compute the final object size. 

[opc@qinzhao-ol8u3-x86]$  /home/opc/Install/latest-d/bin/gcc -O0  t5.c
[opc@qinzhao-ol8u3-x86 ]$ ./a.out
-1
Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, 
therefore, the call to __bdos is Not in the same routine as the instantiation 
of the object, As a result, the TYPE info and the attached counted_by info of 
the object can NOT be USED by the __bdos call. 

Keep in mind of the above 2 situations, we will refer them in below:

1. First,  the problem we are trying to resolve is:

(Your description):

>  the reordering of __bdos w.r.t. initialization of the size parameter but to 
> also account for DSE of the assignment, we can abstract this problem to that 
> of DFA being unable to see implicit use of the size parameter in the __bdos 
> call.

basically is correct.  However, with the following exception:

The implicit use of the size parameter in the __bdos call is not always there, 
it ONLY exists WHEN the __bdos is able to evaluated to an expression of the 
size parameter in the “objsz” phase, i.e., the “Situation 1” of the above 
example. 
 In the “Situation 2”, when the __bdos does not see the TYPE of the real 
object,  it does not see the counted_by information from the TYPE, therefore,  
it is not able to evaluate the size of the object through the counted_by 
information.  As a result, the implicit use of the size parameter in the __bdos 
call does NOT exist at all.  The optimizer can freely reorder the 
initialization of the size parameter with the __bdos call since there is no 
data flow dependency between these two. 

With this exception in mind, we can see that your proposed “option 2” (making 
the type of size “volatile”) is too conservative, it will  disable many 
optimizations  unnecessarily, even though it’s safe and simple to implement. 

As a compiler optimization person for many many years, I really don’t want to 
take this approach at this moment.  -:)

2. Some facts I’d like to mention:

A.  The incorrect reordering (or CSE) potential ONLY exists in the TREE 
optimization stage. During RTL stage,  the __bdos call has already been 
replaced by an expression of the size parameter or a constant, the data 
dependency is explicitly in the IR already.  I believe that the data analysis 
in RTL stage should pick up the data dependency correctly, No special handling 
is needed in RTL.

B. If the __bdos call cannot see the real object , it has no way to get the 
“counted_by” field from the TYPE of the real object. So, if we try to add the 
implicit use of the “counted_by” field to the __bdos call, the object 
instantiation should be in the same routine as the __bdos call.  Both the FE 
and the gimplification phase are too early to do this work. 

2. Then, what’s the best approach to resolve this problem:

There were several suggestions so far:

A.  Add an additional argument, the size parameter,  to __bdos, 
      A.1, during FE;
      A.2, during gimplification phase;
B.  Encode the implicit USE  in the type of size, to make the size “volatile”;
C.  Encode the implicit USE  in the type of buf, then update the optimization 
passes to use this implicit USE encoded in the type of buf.

As I explained in the above, 
** Approach A (both A.1 and A.2) does not work;
** Approach B will have big performance impact, I’d prefer not to take this 
approach at this moment.
** Approach C will be a lot of change in GCC, and also not very necessary since 
the ONLY implicit use of the size parameter is in the __bdos call when __bdos 
can see the real object.

So, all the above proposed approaches, A, B, C, are not very good. 

Then, maybe the following might work better?

In the tree optimization stage, 
    * After the inlining transformation applied,  
+  * Before the data-flow related optimization happens, 
+  * when the data flow analysis is constructed, 

For each call to __bdos, add the implicit use of size parameter. 

Is this doable? 

Otherwise, we might need to take the “volatile” approach. 

Let me know your suggestion and comment.

Thanks a lot.

Qing


>  __bdos is the one such implicit user of the size parameter and you're 
> proposing to solve this by encoding the relationship between buffer and size 
> at the __bdos call site.  But what about the case when the instantiation of 
> the object is not at the same place as the __bdos call site, i.e. the DFA is 
> unable to make that relationship?
> 
> The example Martin showed where the subobject gets "hidden" behind a pointer 
> was a trivial one where DFA *may* actually work in practice (because the 
> object-size pass can thread through these assignments) but think about this 
> one:
> 
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> }
> 
> static size_t
> get_size_of (void *ptr)
> {
>  return __bdos (ptr, 1);
> }
> 
> void
> foo (size_t sz)
> {
>  struct A *obj = __builtin_malloc (sz);
>  obj->size = sz;
> 
>  ...
>  __builtin_printf ("%zu\n", get_size_of (obj->array));
>  ...
> }
> 
> Until get_size_of is inlined, no DFA can see the __bdos call in the same 
> place as the point where obj is allocated.  As a result, the assignment to 
> obj->size could get reordered (or the store eliminated) w.r.t. the __bdos 
> call until the inlining happens.
> 
> As a result, the relationship between buf and size established by the 
> attribute needs to be encoded into the type somehow.  There are two options:
> 
> Option 1: Encode the relationship in the type of buf
> 
> This is kinda what you end up doing with component_ref_has_counted_by and it 
> does show the relationship if one is looking (through that call), but nothing 
> more that can be used to, e.g. prevent reordering or tell the optimizer that 
> the reference to the buf member may imply a reference to the size member as 
> well.  This could be remedied by somehow encoding the USES relationship for 
> size into the type of buf that the optimization passes can see.  I feel like 
> this may be a bit convoluted to specify in a future language extension in a 
> way that will actually be well understood by developers, but it will likely 
> generate faster runtime code.  This will also likely require a bigger change 
> across passes.
> 
> Option 2: Encode the relationship in the type of size
> 
> The other option is to enhance the type of size somehow so that it 
> discourages reordering and store elimination, basically pessimizing code.  I 
> think volatile semantics might be the way to do this and may even be 
> straightforward to specify in the future language extension given that it 
> builds on a known language construct and is thematically related.  However it 
> does pessimize output for code that implements __counted_by__.
> 
> Thanks,
> Sid

Re: HELP: Will the reordering happen? Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

Reply via email to