> On Oct 24, 2023, at 4:38 PM, Martin Uecker <uec...@tugraz.at> wrote: > > Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao: >> Hi, Sid, >> >> Really appreciate for your example and detailed explanation. Very helpful. >> I think that this example is an excellent example to show (almost) all the >> issues we need to consider. >> >> I slightly modified this example to make it to be compilable and run-able, >> as following: >> (but I still cannot make the incorrect reordering or DSE happening, anyway, >> the potential reordering possibility is there…) >> >> 1 #include <malloc.h> >> 2 struct A >> 3 { >> 4 size_t size; >> 5 char buf[] __attribute__((counted_by(size))); >> 6 }; >> 7 >> 8 static size_t >> 9 get_size_from (void *ptr) >> 10 { >> 11 return __builtin_dynamic_object_size (ptr, 1); >> 12 } >> 13 >> 14 void >> 15 foo (size_t sz) >> 16 { >> 17 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char)); >> 18 obj->size = sz; >> 19 obj->buf[0] = 2; >> 20 __builtin_printf (“%d\n", get_size_from (obj->buf)); >> 21 return; >> 22 } >> 23 >> 24 int main () >> 25 { >> 26 foo (20); >> 27 return 0; >> 28 } >> >> With my GCC, it was compiled and worked: >> [opc@qinzhao-ol8u3-x86 ]$ /home/opc/Install/latest-d/bin/gcc -O1 t5.c >> [opc@qinzhao-ol8u3-x86 ]$ ./a.out >> 20 >> Situation 1: With O1 and above, the routine “get_size_from” was inlined into >> “foo”, therefore, the call to __bdos is in the same routine as the >> instantiation of the object, and the TYPE information and the attached >> counted_by attribute information in the TYPE of the object can be USED by >> the __bdos call to compute the final object size. >> >> [opc@qinzhao-ol8u3-x86]$ /home/opc/Install/latest-d/bin/gcc -O0 t5.c >> [opc@qinzhao-ol8u3-x86 ]$ ./a.out >> -1 >> Situation 2: With O0, the routine “get_size_from” was NOT inlined into >> “foo”, therefore, the call to __bdos is Not in the same routine as the >> instantiation of the object, As a result, the TYPE info and the attached >> counted_by info of the object can NOT be USED by the __bdos call. >> >> Keep in mind of the above 2 situations, we will refer them in below: >> >> 1. First, the problem we are trying to resolve is: >> >> (Your description): >> >>> the reordering of __bdos w.r.t. initialization of the size parameter but to >>> also account for DSE of the assignment, we can abstract this problem to >>> that of DFA being unable to see implicit use of the size parameter in the >>> __bdos call. >> >> basically is correct. However, with the following exception: >> >> The implicit use of the size parameter in the __bdos call is not always >> there, it ONLY exists WHEN the __bdos is able to evaluated to an expression >> of the size parameter in the “objsz” phase, i.e., the “Situation 1” of the >> above example. >> In the “Situation 2”, when the __bdos does not see the TYPE of the real >> object, it does not see the counted_by information from the TYPE, >> therefore, it is not able to evaluate the size of the object through the >> counted_by information. As a result, the implicit use of the size parameter >> in the __bdos call does NOT exist at all. The optimizer can freely reorder >> the initialization of the size parameter with the __bdos call since there is >> no data flow dependency between these two. >> >> With this exception in mind, we can see that your proposed “option 2” >> (making the type of size “volatile”) is too conservative, it will disable >> many optimizations unnecessarily, even though it’s safe and simple to >> implement. >> >> As a compiler optimization person for many many years, I really don’t want >> to take this approach at this moment. -:) >> >> 2. Some facts I’d like to mention: >> >> A. The incorrect reordering (or CSE) potential ONLY exists in the TREE >> optimization stage. During RTL stage, the __bdos call has already been >> replaced by an expression of the size parameter or a constant, the data >> dependency is explicitly in the IR already. I believe that the data >> analysis in RTL stage should pick up the data dependency correctly, No >> special handling is needed in RTL. >> >> B. If the __bdos call cannot see the real object , it has no way to get the >> “counted_by” field from the TYPE of the real object. So, if we try to add >> the implicit use of the “counted_by” field to the __bdos call, the object >> instantiation should be in the same routine as the __bdos call. Both the FE >> and the gimplification phase are too early to do this work. >> >> 2. Then, what’s the best approach to resolve this problem: >> >> There were several suggestions so far: >> >> A. Add an additional argument, the size parameter, to __bdos, >> A.1, during FE; >> A.2, during gimplification phase; >> B. Encode the implicit USE in the type of size, to make the size >> “volatile”; >> C. Encode the implicit USE in the type of buf, then update the >> optimization passes to use this implicit USE encoded in the type of buf. >> >> As I explained in the above, >> ** Approach A (both A.1 and A.2) does not work; >> ** Approach B will have big performance impact, I’d prefer not to take this >> approach at this moment. >> ** Approach C will be a lot of change in GCC, and also not very necessary >> since the ONLY implicit use of the size parameter is in the __bdos call when >> __bdos can see the real object. >> >> So, all the above proposed approaches, A, B, C, are not very good. >> >> Then, maybe the following might work better? >> >> In the tree optimization stage, >> * After the inlining transformation applied, >> + * Before the data-flow related optimization happens, >> + * when the data flow analysis is constructed, >> >> For each call to __bdos, add the implicit use of size parameter. >> >> Is this doable? > > Here is another proposal: Add a new builtin function > > __builtin_with_size(x, size) > > that return x but behaves similar to an allocation > function in that BDOS can look at the size argument > to discover the size. > > The FE insers this function when the field is accessed: > > __builtin_with_size(x.buf, x.L);
Thanks for the proposal! So what you suggested is: For every x.buf, change it as a __builtin_with_size(x.buf, x.L) in the FE, then the call to the _bdos (x.buf, 1) will Become: _bdos(__builtin_with_size(x.buf, x.L), 1)? Then the implicit use of x.L in _bdos(x.buf.1) will become explicit? This looks like a very promising solution. Will study this a. Little bit more. Qing > > > Martin > > > >> >> Otherwise, we might need to take the “volatile” approach. >> >> Let me know your suggestion and comment. >> >> Thanks a lot. >> >> Qing >> >> >>> __bdos is the one such implicit user of the size parameter and you're >>> proposing to solve this by encoding the relationship between buffer and >>> size at the __bdos call site. But what about the case when the >>> instantiation of the object is not at the same place as the __bdos call >>> site, i.e. the DFA is unable to make that relationship? >>> >>> The example Martin showed where the subobject gets "hidden" behind a >>> pointer was a trivial one where DFA *may* actually work in practice >>> (because the object-size pass can thread through these assignments) but >>> think about this one: >>> >>> struct A >>> { >>> size_t size; >>> char buf[] __attribute__((counted_by(size))); >>> } >>> >>> static size_t >>> get_size_of (void *ptr) >>> { >>> return __bdos (ptr, 1); >>> } >>> >>> void >>> foo (size_t sz) >>> { >>> struct A *obj = __builtin_malloc (sz); >>> obj->size = sz; >>> >>> ... >>> __builtin_printf ("%zu\n", get_size_of (obj->array)); >>> ... >>> } >>> >>> Until get_size_of is inlined, no DFA can see the __bdos call in the same >>> place as the point where obj is allocated. As a result, the assignment to >>> obj->size could get reordered (or the store eliminated) w.r.t. the __bdos >>> call until the inlining happens. >>> >>> As a result, the relationship between buf and size established by the >>> attribute needs to be encoded into the type somehow. There are two options: >>> >>> Option 1: Encode the relationship in the type of buf >>> >>> This is kinda what you end up doing with component_ref_has_counted_by and >>> it does show the relationship if one is looking (through that call), but >>> nothing more that can be used to, e.g. prevent reordering or tell the >>> optimizer that the reference to the buf member may imply a reference to the >>> size member as well. This could be remedied by somehow encoding the USES >>> relationship for size into the type of buf that the optimization passes can >>> see. I feel like this may be a bit convoluted to specify in a future >>> language extension in a way that will actually be well understood by >>> developers, but it will likely generate faster runtime code. This will >>> also likely require a bigger change across passes. >>> >>> Option 2: Encode the relationship in the type of size >>> >>> The other option is to enhance the type of size somehow so that it >>> discourages reordering and store elimination, basically pessimizing code. >>> I think volatile semantics might be the way to do this and may even be >>> straightforward to specify in the future language extension given that it >>> builds on a known language construct and is thematically related. However >>> it does pessimize output for code that implements __counted_by__. >>> >>> Thanks, >>> Sid