> On Oct 25, 2023, at 6:39 AM, Martin Uecker <uec...@tugraz.at> wrote: > > Am Mittwoch, dem 25.10.2023 um 12:25 +0200 schrieb Richard Biener: >> >>> Am 25.10.2023 um 10:16 schrieb Martin Uecker <uec...@tugraz.at>: >>> >>> Am Mittwoch, dem 25.10.2023 um 08:43 +0200 schrieb Richard Biener: >>>> >>>>>> Am 24.10.2023 um 22:38 schrieb Martin Uecker <uec...@tugraz.at>: >>>>> >>>>> Am Dienstag, dem 24.10.2023 um 20:30 +0000 schrieb Qing Zhao: >>>>>> Hi, Sid, >>>>>> >>>>>> Really appreciate for your example and detailed explanation. Very >>>>>> helpful. >>>>>> I think that this example is an excellent example to show (almost) all >>>>>> the issues we need to consider. >>>>>> >>>>>> I slightly modified this example to make it to be compilable and >>>>>> run-able, as following: >>>>>> (but I still cannot make the incorrect reordering or DSE happening, >>>>>> anyway, the potential reordering possibility is there…) >>>>>> >>>>>> 1 #include <malloc.h> >>>>>> 2 struct A >>>>>> 3 { >>>>>> 4 size_t size; >>>>>> 5 char buf[] __attribute__((counted_by(size))); >>>>>> 6 }; >>>>>> 7 >>>>>> 8 static size_t >>>>>> 9 get_size_from (void *ptr) >>>>>> 10 { >>>>>> 11 return __builtin_dynamic_object_size (ptr, 1); >>>>>> 12 } >>>>>> 13 >>>>>> 14 void >>>>>> 15 foo (size_t sz) >>>>>> 16 { >>>>>> 17 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * >>>>>> sizeof(char)); >>>>>> 18 obj->size = sz; >>>>>> 19 obj->buf[0] = 2; >>>>>> 20 __builtin_printf (“%d\n", get_size_from (obj->buf)); >>>>>> 21 return; >>>>>> 22 } >>>>>> 23 >>>>>> 24 int main () >>>>>> 25 { >>>>>> 26 foo (20); >>>>>> 27 return 0; >>>>>> 28 } >>>>>> >>>>>> With my GCC, it was compiled and worked: >>>>>> [opc@qinzhao-ol8u3-x86 ]$ /home/opc/Install/latest-d/bin/gcc -O1 t5.c >>>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out >>>>>> 20 >>>>>> Situation 1: With O1 and above, the routine “get_size_from” was inlined >>>>>> into “foo”, therefore, the call to __bdos is in the same routine as the >>>>>> instantiation of the object, and the TYPE information and the attached >>>>>> counted_by attribute information in the TYPE of the object can be USED >>>>>> by the __bdos call to compute the final object size. >>>>>> >>>>>> [opc@qinzhao-ol8u3-x86]$ /home/opc/Install/latest-d/bin/gcc -O0 t5.c >>>>>> [opc@qinzhao-ol8u3-x86 ]$ ./a.out >>>>>> -1 >>>>>> Situation 2: With O0, the routine “get_size_from” was NOT inlined into >>>>>> “foo”, therefore, the call to __bdos is Not in the same routine as the >>>>>> instantiation of the object, As a result, the TYPE info and the attached >>>>>> counted_by info of the object can NOT be USED by the __bdos call. >>>>>> >>>>>> Keep in mind of the above 2 situations, we will refer them in below: >>>>>> >>>>>> 1. First, the problem we are trying to resolve is: >>>>>> >>>>>> (Your description): >>>>>> >>>>>>> the reordering of __bdos w.r.t. initialization of the size parameter >>>>>>> but to also account for DSE of the assignment, we can abstract this >>>>>>> problem to that of DFA being unable to see implicit use of the size >>>>>>> parameter in the __bdos call. >>>>>> >>>>>> basically is correct. However, with the following exception: >>>>>> >>>>>> The implicit use of the size parameter in the __bdos call is not always >>>>>> there, it ONLY exists WHEN the __bdos is able to evaluated to an >>>>>> expression of the size parameter in the “objsz” phase, i.e., the >>>>>> “Situation 1” of the above example. >>>>>> In the “Situation 2”, when the __bdos does not see the TYPE of the real >>>>>> object, it does not see the counted_by information from the TYPE, >>>>>> therefore, it is not able to evaluate the size of the object through >>>>>> the counted_by information. As a result, the implicit use of the size >>>>>> parameter in the __bdos call does NOT exist at all. The optimizer can >>>>>> freely reorder the initialization of the size parameter with the __bdos >>>>>> call since there is no data flow dependency between these two. >>>>>> >>>>>> With this exception in mind, we can see that your proposed “option 2” >>>>>> (making the type of size “volatile”) is too conservative, it will >>>>>> disable many optimizations unnecessarily, even though it’s safe and >>>>>> simple to implement. >>>>>> >>>>>> As a compiler optimization person for many many years, I really don’t >>>>>> want to take this approach at this moment. -:) >>>>>> >>>>>> 2. Some facts I’d like to mention: >>>>>> >>>>>> A. The incorrect reordering (or CSE) potential ONLY exists in the TREE >>>>>> optimization stage. During RTL stage, the __bdos call has already been >>>>>> replaced by an expression of the size parameter or a constant, the data >>>>>> dependency is explicitly in the IR already. I believe that the data >>>>>> analysis in RTL stage should pick up the data dependency correctly, No >>>>>> special handling is needed in RTL. >>>>>> >>>>>> B. If the __bdos call cannot see the real object , it has no way to get >>>>>> the “counted_by” field from the TYPE of the real object. So, if we try >>>>>> to add the implicit use of the “counted_by” field to the __bdos call, >>>>>> the object instantiation should be in the same routine as the __bdos >>>>>> call. Both the FE and the gimplification phase are too early to do this >>>>>> work. >>>>>> >>>>>> 2. Then, what’s the best approach to resolve this problem: >>>>>> >>>>>> There were several suggestions so far: >>>>>> >>>>>> A. Add an additional argument, the size parameter, to __bdos, >>>>>> A.1, during FE; >>>>>> A.2, during gimplification phase; >>>>>> B. Encode the implicit USE in the type of size, to make the size >>>>>> “volatile”; >>>>>> C. Encode the implicit USE in the type of buf, then update the >>>>>> optimization passes to use this implicit USE encoded in the type of buf. >>>>>> >>>>>> As I explained in the above, >>>>>> ** Approach A (both A.1 and A.2) does not work; >>>>>> ** Approach B will have big performance impact, I’d prefer not to take >>>>>> this approach at this moment. >>>>>> ** Approach C will be a lot of change in GCC, and also not very >>>>>> necessary since the ONLY implicit use of the size parameter is in the >>>>>> __bdos call when __bdos can see the real object. >>>>>> >>>>>> So, all the above proposed approaches, A, B, C, are not very good. >>>>>> >>>>>> Then, maybe the following might work better? >>>>>> >>>>>> In the tree optimization stage, >>>>>> * After the inlining transformation applied, >>>>>> + * Before the data-flow related optimization happens, >>>>>> + * when the data flow analysis is constructed, >>>>>> >>>>>> For each call to __bdos, add the implicit use of size parameter. >>>>>> >>>>>> Is this doable? >>>>> >>>>> Here is another proposal: Add a new builtin function >>>>> >>>>> __builtin_with_size(x, size) >>>>> >>>>> that return x but behaves similar to an allocation >>>>> function in that BDOS can look at the size argument >>>>> to discover the size. >>>>> >>>>> The FE insers this function when the field is accessed: >>>> >>>> When it’s set I suppose. Turn >>>> >>>> X.l = n; >>>> >>>> Into >>>> >>>> X.l = __builtin_with_size (x.buf, n); >>> >>> It would turn >>> >>> some_variable = (&) x.buf >>> >>> into >>> >>> some_variable = __builtin_with_size ( (&) x.buf. x.len) >> >> Unless you use the address of x.Len this will not work when len is >> initialized after buf. And the address will not have a meaningful data >> dependence. >>> > > It would be a semantic requirement for this feature that > x.len needs to be initialized before x.buf is accessed.
Yes, that’s right, we might need to clarify this into the documentation of the counted_by. It should be a user error if the source code violate this rule. Qing > > Otherwise, I am not sure how to define the time point > at which x.len should be evaluated. > >>> So the later access to x.buf and not the initialization >>> of a member of the struct (which is too early). >> >>>> >>>> And indeed we need sth like a fat pointer to reliably solve all the issues. >>> >>> What happens for other languages such as FORTRAN >>> and ADA do? Are those pointers lowered in the FE? >> >> Yes >> >>> To me it seems there are two sound ways to introduce >>> such information: >>> >>> - either by using the type system. This works in >>> the FE in C using variably modified types >>> >>> char buf[n]; >>> __auto_type p = &buf; >>> >>> ... = sizeof (*p); >>> >>> But if I understand Jakob's comment to some PR >>> correctly the size information in the TREE_TYPE >>> is not processed correctly anymore in the >>> middle-end. >> >> The type based info is lowered during gimplification and in particular for >> pointer types the middle-end quickly loses track of the original type. >> > > Would it work if we make sure that we find a suitable > type? Or in other words, are the (non-constant) size > expressions inside it still useful in later passes? > > Martin > > >> Richard >> >>> >>> - or one injects the information via some >>> tree node or builtin at certain points in >>> time as suggested here, and the compiler >>> derives the information from these points >>> as tree-object-size does. >>> >>> >>> The use of attributes seems fragile and - looking >>> at the access attribute also overly complex. And >>> we somehow support this only for function types >>> and not elsewhere and also this then gets lost >>> during inlining. So I think for all this stuff >>> (nonnull, access, counted_by) I think a better >>> approach is needed. >>> >>> >>> Martin >>> >>> >>>> >>>> Richard >>> >>> >>> >>> >>>> >>>>> __builtin_with_size(x.buf, x.L); >>>>> >>>>> >>>>> Martin >>>>> >>>>> >>>>> >>>>>> >>>>>> Otherwise, we might need to take the “volatile” approach. >>>>>> >>>>>> Let me know your suggestion and comment. >>>>>> >>>>>> Thanks a lot. >>>>>> >>>>>> Qing >>>>>> >>>>>> >>>>>>> __bdos is the one such implicit user of the size parameter and you're >>>>>>> proposing to solve this by encoding the relationship between buffer and >>>>>>> size at the __bdos call site. But what about the case when the >>>>>>> instantiation of the object is not at the same place as the __bdos call >>>>>>> site, i.e. the DFA is unable to make that relationship? >>>>>>> >>>>>>> The example Martin showed where the subobject gets "hidden" behind a >>>>>>> pointer was a trivial one where DFA *may* actually work in practice >>>>>>> (because the object-size pass can thread through these assignments) but >>>>>>> think about this one: >>>>>>> >>>>>>> struct A >>>>>>> { >>>>>>> size_t size; >>>>>>> char buf[] __attribute__((counted_by(size))); >>>>>>> } >>>>>>> >>>>>>> static size_t >>>>>>> get_size_of (void *ptr) >>>>>>> { >>>>>>> return __bdos (ptr, 1); >>>>>>> } >>>>>>> >>>>>>> void >>>>>>> foo (size_t sz) >>>>>>> { >>>>>>> struct A *obj = __builtin_malloc (sz); >>>>>>> obj->size = sz; >>>>>>> >>>>>>> ... >>>>>>> __builtin_printf ("%zu\n", get_size_of (obj->array)); >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> Until get_size_of is inlined, no DFA can see the __bdos call in the >>>>>>> same place as the point where obj is allocated. As a result, the >>>>>>> assignment to obj->size could get reordered (or the store eliminated) >>>>>>> w.r.t. the __bdos call until the inlining happens. >>>>>>> >>>>>>> As a result, the relationship between buf and size established by the >>>>>>> attribute needs to be encoded into the type somehow. There are two >>>>>>> options: >>>>>>> >>>>>>> Option 1: Encode the relationship in the type of buf >>>>>>> >>>>>>> This is kinda what you end up doing with component_ref_has_counted_by >>>>>>> and it does show the relationship if one is looking (through that >>>>>>> call), but nothing more that can be used to, e.g. prevent reordering or >>>>>>> tell the optimizer that the reference to the buf member may imply a >>>>>>> reference to the size member as well. This could be remedied by >>>>>>> somehow encoding the USES relationship for size into the type of buf >>>>>>> that the optimization passes can see. I feel like this may be a bit >>>>>>> convoluted to specify in a future language extension in a way that will >>>>>>> actually be well understood by developers, but it will likely generate >>>>>>> faster runtime code. This will also likely require a bigger change >>>>>>> across passes. >>>>>>> >>>>>>> Option 2: Encode the relationship in the type of size >>>>>>> >>>>>>> The other option is to enhance the type of size somehow so that it >>>>>>> discourages reordering and store elimination, basically pessimizing >>>>>>> code. I think volatile semantics might be the way to do this and may >>>>>>> even be straightforward to specify in the future language extension >>>>>>> given that it builds on a known language construct and is thematically >>>>>>> related. However it does pessimize output for code that implements >>>>>>> __counted_by__. >>>>>>> >>>>>>> Thanks, >>>>>>> Sid >>>>>> >>>>> >>> >>> -- >>> Univ.-Prof. Dr. rer. nat. Martin Uecker >>> Graz University of Technology >>> Institute of Biomedical Imaging >>> >>> > > -- > Univ.-Prof. Dr. rer. nat. Martin Uecker > Graz University of Technology > Institute of Biomedical Imaging > >