Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao: > Hi, Sid, > > Really appreciate for your example and detailed explanation. Very helpful. > I think that this example is an excellent example to show (almost) all the > issues we need to consider. > > I slightly modified this example to make it to be compilable and run-able, as > following: > (but I still cannot make the incorrect reordering or DSE happening, anyway, > the potential reordering possibility is there…) > > 1 #include <malloc.h> > 2 struct A > 3 { > 4 size_t size; > 5 char buf[] __attribute__((counted_by(size))); > 6 }; > 7 > 8 static size_t > 9 get_size_from (void *ptr) > 10 { > 11 return __builtin_dynamic_object_size (ptr, 1); > 12 } > 13 > 14 void > 15 foo (size_t sz) > 16 { > 17 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); > 18 obj->size = sz; > 19 obj->buf[0] = 2; > 20 __builtin_printf (“%d\n", get_size_from (obj->buf)); > 21 return; > 22 } > 23 > 24 int main () > 25 { > 26 foo (20); > 27 return 0; > 28 } > > With my GCC, it was compiled and worked: > [opc@qinzhao-ol8u3-x86 ]$ /home/opc/Install/latest-d/bin/gcc -O1 t5.c > [opc@qinzhao-ol8u3-x86 ]$ ./a.out > 20 > Situation 1: With O1 and above, the routine “get_size_from” was inlined into > “foo”, therefore, the call to __bdos is in the same routine as the > instantiation of the object, and the TYPE information and the attached > counted_by attribute information in the TYPE of the object can be USED by the > __bdos call to compute the final object size. > > [opc@qinzhao-ol8u3-x86]$ /home/opc/Install/latest-d/bin/gcc -O0 t5.c > [opc@qinzhao-ol8u3-x86 ]$ ./a.out > -1 > Situation 2: With O0, the routine “get_size_from” was NOT inlined into “foo”, > therefore, the call to __bdos is Not in the same routine as the instantiation > of the object, As a result, the TYPE info and the attached counted_by info of > the object can NOT be USED by the __bdos call. > > Keep in mind of the above 2 situations, we will refer them in below: > > 1. First, the problem we are trying to resolve is: > > (Your description): > > > the reordering of __bdos w.r.t. initialization of the size parameter but > > to also account for DSE of the assignment, we can abstract this problem to > > that of DFA being unable to see implicit use of the size parameter in the > > __bdos call. > > basically is correct. However, with the following exception: > > The implicit use of the size parameter in the __bdos call is not always > there, it ONLY exists WHEN the __bdos is able to evaluated to an expression > of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the > above example. > In the “Situation 2”, when the __bdos does not see the TYPE of the real > object, it does not see the counted_by information from the TYPE, therefore, > it is not able to evaluate the size of the object through the counted_by > information. As a result, the implicit use of the size parameter in the > __bdos call does NOT exist at all. The optimizer can freely reorder the > initialization of the size parameter with the __bdos call since there is no > data flow dependency between these two. > > With this exception in mind, we can see that your proposed “option 2” (making > the type of size “volatile”) is too conservative, it will disable many > optimizations unnecessarily, even though it’s safe and simple to implement. > > As a compiler optimization person for many many years, I really don’t want to > take this approach at this moment. -:) > > 2. Some facts I’d like to mention: > > A. The incorrect reordering (or CSE) potential ONLY exists in the TREE > optimization stage. During RTL stage, the __bdos call has already been > replaced by an expression of the size parameter or a constant, the data > dependency is explicitly in the IR already. I believe that the data analysis > in RTL stage should pick up the data dependency correctly, No special > handling is needed in RTL. > > B. If the __bdos call cannot see the real object , it has no way to get the > “counted_by” field from the TYPE of the real object. So, if we try to add the > implicit use of the “counted_by” field to the __bdos call, the object > instantiation should be in the same routine as the __bdos call. Both the FE > and the gimplification phase are too early to do this work. > > 2. Then, what’s the best approach to resolve this problem: > > There were several suggestions so far: > > A. Add an additional argument, the size parameter, to __bdos, > A.1, during FE; > A.2, during gimplification phase; > B. Encode the implicit USE in the type of size, to make the size “volatile”; > C. Encode the implicit USE in the type of buf, then update the optimization > passes to use this implicit USE encoded in the type of buf. > > As I explained in the above, > ** Approach A (both A.1 and A.2) does not work; > ** Approach B will have big performance impact, I’d prefer not to take this > approach at this moment. > ** Approach C will be a lot of change in GCC, and also not very necessary > since the ONLY implicit use of the size parameter is in the __bdos call when > __bdos can see the real object. > > So, all the above proposed approaches, A, B, C, are not very good. > > Then, maybe the following might work better? > > In the tree optimization stage, > * After the inlining transformation applied, > + * Before the data-flow related optimization happens, > + * when the data flow analysis is constructed, > > For each call to __bdos, add the implicit use of size parameter. > > Is this doable?
Here is another proposal: Add a new builtin function __builtin_with_size(x, size) that return x but behaves similar to an allocation function in that BDOS can look at the size argument to discover the size. The FE insers this function when the field is accessed: __builtin_with_size(x.buf, x.L); Martin > > Otherwise, we might need to take the “volatile” approach. > > Let me know your suggestion and comment. > > Thanks a lot. > > Qing > > > > __bdos is the one such implicit user of the size parameter and you're > > proposing to solve this by encoding the relationship between buffer and > > size at the __bdos call site. But what about the case when the > > instantiation of the object is not at the same place as the __bdos call > > site, i.e. the DFA is unable to make that relationship? > > > > The example Martin showed where the subobject gets "hidden" behind a > > pointer was a trivial one where DFA *may* actually work in practice > > (because the object-size pass can thread through these assignments) but > > think about this one: > > > > struct A > > { > > size_t size; > > char buf[] __attribute__((counted_by(size))); > > } > > > > static size_t > > get_size_of (void *ptr) > > { > > return __bdos (ptr, 1); > > } > > > > void > > foo (size_t sz) > > { > > struct A *obj = __builtin_malloc (sz); > > obj->size = sz; > > > > ... > > __builtin_printf ("%zu\n", get_size_of (obj->array)); > > ... > > } > > > > Until get_size_of is inlined, no DFA can see the __bdos call in the same > > place as the point where obj is allocated. As a result, the assignment to > > obj->size could get reordered (or the store eliminated) w.r.t. the __bdos > > call until the inlining happens. > > > > As a result, the relationship between buf and size established by the > > attribute needs to be encoded into the type somehow. There are two options: > > > > Option 1: Encode the relationship in the type of buf > > > > This is kinda what you end up doing with component_ref_has_counted_by and > > it does show the relationship if one is looking (through that call), but > > nothing more that can be used to, e.g. prevent reordering or tell the > > optimizer that the reference to the buf member may imply a reference to the > > size member as well. This could be remedied by somehow encoding the USES > > relationship for size into the type of buf that the optimization passes can > > see. I feel like this may be a bit convoluted to specify in a future > > language extension in a way that will actually be well understood by > > developers, but it will likely generate faster runtime code. This will > > also likely require a bigger change across passes. > > > > Option 2: Encode the relationship in the type of size > > > > The other option is to enhance the type of size somehow so that it > > discourages reordering and store elimination, basically pessimizing code. > > I think volatile semantics might be the way to do this and may even be > > straightforward to specify in the future language extension given that it > > builds on a known language construct and is thematically related. However > > it does pessimize output for code that implements __counted_by__. > > > > Thanks, > > Sid >