Re: Functions that are CSEable but not pure
On Fri, 12 Oct 2012, Jason Merrill wrote: > > Consider instead the following expansion: > > > > __thread void (*i_initialize)(void) = i_init; > > That would work too, though it wouldn't have the benefit of link-compatibility > with __thread (for variables that don't need dynamic initialization) that the > current scheme has. I'd certainly think that for C, __thread and _Thread_local should be link-compatible (and probably should be treated more or less identically apart from some diagnostics), and that C11 _Thread_local should be link-compatible (in the absence of dynamic initialization) with C++11 thread_local. -- Joseph S. Myers jos...@codesourcery.com
Re: Functions that are CSEable but not pure
On 10/11/2012 03:48 PM, Alexandre Oliva wrote: This is not entirely clear to me; where is i defined in the first place? Um, it's defined where the user writes the definition, like any other variable. Is it i_init that defines and tests i_initialized? Yes. Consider instead the following expansion: __thread void (*i_initialize)(void) = i_init; That would work too, though it wouldn't have the benefit of link-compatibility with __thread (for variables that don't need dynamic initialization) that the current scheme has. I don't know whether it would be more or less efficient. or this: [EXTERN] __thread T *i; Same answer. If we aren't trying to maintain compatibility, this approach seems like it would be more efficient than i_initialize. Jason
Re: Functions that are CSEable but not pure
On Oct 11, 2012, Jason Merrill wrote: > On 10/11/2012 11:14 AM, Alexandre Oliva wrote: >> How about marking the singleton containing the call to the initializer >> as always_inline, but not the initializer itself? >> >> The compiler can then infer that initialized is set on the first inlined >> call and optimize away subsequent tests and initializer calls >> (call_some_function_that_may_modify_memory). > That would require exporting the initialized flag in addition to the > initializer function; currently it is private to the translation unit > with the initializer function. That is, the wrapper currently looks > like > int& i_wrap() { if (i_init) i_init(); return i; } This is not entirely clear to me; where is i defined in the first place? Is it i_init that defines and tests i_initialized? I.e., do we translate: [EXTERN] thread_local T i; int f() { T& j = i; ... } to: [EXTERN] __thread T i; #if !EXTERN void i_init() { static __thread bool initialized; if (!initialized) { initialized = true; i.T(); } } #endif static T& i_wrap () { if (i_init) i_init (); return i; } int f() { T& j = i_wrap(); ... } ? Consider instead the following expansion: [EXTERN] __thread T i; #if !EXTERN static void i_init (void); __thread void (*i_initialize)(void) = i_init; static void i_init () { i_initialize = NULL; i.T(); } #else EXTERN __thread void (*i_initialize)(void); #endif static T& __attribute__((__always_inline__)) i_wrap () { if (i_initialize) { i_initialize (); // this is redundant, but it should enable removal of subsequent calls i_initialize = NULL; } return i; } or this: [EXTERN] __thread T *i; #if !EXTERN static __thread T i_data; void i_init () { i = &i_data; // or maybe allocate it dynamically? i->T(); } #else extern void i_init (); #endif static T& __attribute__((__always_inline__)) i_wrap () { if (!i) i_init (); return *i; // dereference should enable cse } -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: Functions that are CSEable but not pure
On 10/11/2012 11:14 AM, Alexandre Oliva wrote: How about marking the singleton containing the call to the initializer as always_inline, but not the initializer itself? The compiler can then infer that initialized is set on the first inlined call and optimize away subsequent tests and initializer calls (call_some_function_that_may_modify_memory). That would require exporting the initialized flag in addition to the initializer function; currently it is private to the translation unit with the initializer function. That is, the wrapper currently looks like int& i_wrap() { if (i_init) i_init(); return i; } and your suggestion would change it to int& i_wrap() { if (i_init && !i_initialized) { i_initialized = true; i_init(); } return i; } In previous discussions, people thought this would be less efficient. Jason
Re: Functions that are CSEable but not pure
On Oct 3, 2012, Jason Merrill wrote: > int& lazy_i() > { > static int i = init; > return i; > } > If the initialization is expensive or order-sensitive, this is a > useful alternative to initialization on load > An interesting property of such functions is that they only have > side-effects the first time they are called, so subsequent calls can > be optimized away to just use the return value of the first call. > Currently there is no way to express this in GCC so that the > optimizers know that multiple calls are redundant. On Oct 4, 2012, Jakub Jelinek wrote: > The singleton function really is > void singleton (void) > { > static __thread bool initialized; > if (!initialized) { > initialized = true; > call_some_function_that_may_modify_memory (); > } > } > and has side effects just first time in a thread. How about marking the singleton containing the call to the initializer as always_inline, but not the initializer itself? The compiler can then infer that initialized is set on the first inlined call and optimize away subsequent tests and initializer calls (call_some_function_that_may_modify_memory). -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: Functions that are CSEable but not pure
On 10/05/2012 04:20 PM, Xinliang David Li wrote: So init () will be unconditionally called on entry to the wrapper? In that case, you should probably go to the other extreme -- mark the wrapper as noinline and expanded as if they are builtins -- but run into the tricky issues of 'attributes' as discussed here. I think we can leave the wrapper as a normal inline and use the same attribute on the init function; it also has the property of only having side-effects on the first call, it just doesn't return anything. Jason
Re: Functions that are CSEable but not pure
On Fri, Oct 5, 2012 at 1:12 PM, Jason Merrill wrote: > On 10/05/2012 03:46 PM, Xinliang David Li wrote: >> >> 1) As mentioned in this thread -- when the wrapper is inlined, the >> pure attribute will be lost. It will give compiler hard time to >> optimize away the guard code > > > The wrapper doesn't actually check the guard; that happens in the init > function, which is not inline. > So init () will be unconditionally called on entry to the wrapper? In that case, you should probably go to the other extreme -- mark the wrapper as noinline and expanded as if they are builtins -- but run into the tricky issues of 'attributes' as discussed here. David > Jason >
Re: Functions that are CSEable but not pure
On 10/05/2012 03:46 PM, Xinliang David Li wrote: 1) As mentioned in this thread -- when the wrapper is inlined, the pure attribute will be lost. It will give compiler hard time to optimize away the guard code The wrapper doesn't actually check the guard; that happens in the init function, which is not inline. Jason
Re: Functions that are CSEable but not pure
The semantics of the 'first' reference of the TLS variable has changed, and this change is introduced by the implementation. It is no longer safe to do DCE as thread_local int x = ..; int foo() { x; ... } but should be safe to do DSE as int foo() { x = ...; // Dead x = ...; .. } as well as CSE, PRE, PDE etc. The bigger problems are 1) As mentioned in this thread -- when the wrapper is inlined, the pure attribute will be lost. It will give compiler hard time to optimize away the guard code 2) The runtime overhead of the guard code -- if the guard variable is also tls. Speaking of 1), given two inline instances of the wrapper, if (!init_guard) { init_guard = 1; do_init(); } ... some code if (!init_guard) { init_guard =1 ; do_init(); } If the compiler is taught that init_guard can not be clobbered by anything other than the owner TLS variable's wrapper function, then VRP should should be able to provide init_guard is not zero before the second check thus can completely eliminate the if-block. Inter-procedural elimination requires more work though. so perhaps the solution is just to mark wrapper function always inline? It will punish the code size for O0 build though as the init block can not be optimized away. thanks, David On Fri, Oct 5, 2012 at 10:24 AM, Jason Merrill wrote: > On 10/05/2012 04:46 AM, Richard Guenther wrote: >> >> Ok, so then let me make another example to try breaking a valid program >> with the thread_local wrapper being pure: > > > There is also my example earlier in the thread. > > >>i; >>if (init_count != 1) >> __builtin_abort (); >> >> is that valid? > > > I believe so; that is an odr-use of i, so it needs to be initialized. > > >> Thus my question is - is a valid program allowed to access side-effects >> of the dynamic TLS initializer by not going through the TLS variable? > > > I think so. > > >> Preventing DCE but not CSE for const/pure functions can be for >> example done by using the looping-const-or-pure flag (but that >> also tells the compiler that this function may not return, so it is >> very specific about the kind of possible side-effect to be preserved). >> The very specific nature of thread_local TLS init semantics >> ('somewhen before') is hard to make use of, so if we want to >> change looping-const-or-pure to something like >> const-or-pure-with-side-effects >> we should constrain things more. > > > That would be fine with me. > > Jason >
Re: Functions that are CSEable but not pure
On 10/05/2012 04:46 AM, Richard Guenther wrote: Ok, so then let me make another example to try breaking a valid program with the thread_local wrapper being pure: There is also my example earlier in the thread. i; if (init_count != 1) __builtin_abort (); is that valid? I believe so; that is an odr-use of i, so it needs to be initialized. Thus my question is - is a valid program allowed to access side-effects of the dynamic TLS initializer by not going through the TLS variable? I think so. Preventing DCE but not CSE for const/pure functions can be for example done by using the looping-const-or-pure flag (but that also tells the compiler that this function may not return, so it is very specific about the kind of possible side-effect to be preserved). The very specific nature of thread_local TLS init semantics ('somewhen before') is hard to make use of, so if we want to change looping-const-or-pure to something like const-or-pure-with-side-effects we should constrain things more. That would be fine with me. Jason
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 8:02 PM, Jason Merrill wrote: > On 10/04/2012 01:42 PM, Richard Guenther wrote: >> >> So I suppose the testcase that would be "valid" but break with using >> pure would be instead >> >> int main() >> { >>int x = init_count; >>int *p = get_me(); >>if (init_count == x) >> __builtin_abort(); >>int *q = get_me(); >>if (init_count == x) >> __builtin_abort(); >> } >> >> here when get_me is pure we CSE init_count over the _first_ call of >> get_me. > > > That's OK for C++ thread_local semantics; the initialization is specified to > happen some time before the first use, so the testcase is making an invalid > assumption. This might not be desirable for user-written singleton > functions, though. Ok, so then let me make another example to try breaking a valid program with the thread_local wrapper being pure: int main() { int &p = get_me(); *p; if (init_count == 1) __builtin_abort(); } note my using of C++ references (which in this case cannot bind to NULL and thus *p might not trap(?)). So the C++ source would be something like thread_local int init_count; int foo () { init_count = 1; return 0; } thread_local int i = foo (); int main() { i; if (init_count != 1) __builtin_abort (); } is that valid? When lowered to the above we would DCE the load of i and the call to the initialization wrapper (because it is pure). Obviously then init_count is no longer 1. Thus my question is - is a valid program allowed to access side-effects of the dynamic TLS initializer by not going through the TLS variable? I realize that's somewhat ill-formed if the TLS variable was a class with a static method doing the access to init_count - using that method would not keep the class instance life. Preventing DCE but not CSE for const/pure functions can be for example done by using the looping-const-or-pure flag (but that also tells the compiler that this function may not return, so it is very specific about the kind of possible side-effect to be preserved). The very specific nature of thread_local TLS init semantics ('somewhen before') is hard to make use of, so if we want to change looping-const-or-pure to something like const-or-pure-with-side-effects we should constrain things more. Richard. > Jason >
Re: Functions that are CSEable but not pure
On 10/04/2012 01:42 PM, Richard Guenther wrote: So I suppose the testcase that would be "valid" but break with using pure would be instead int main() { int x = init_count; int *p = get_me(); if (init_count == x) __builtin_abort(); int *q = get_me(); if (init_count == x) __builtin_abort(); } here when get_me is pure we CSE init_count over the _first_ call of get_me. That's OK for C++ thread_local semantics; the initialization is specified to happen some time before the first use, so the testcase is making an invalid assumption. This might not be desirable for user-written singleton functions, though. Jason
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 7:15 PM, Jakub Jelinek wrote: > On Thu, Oct 04, 2012 at 07:08:02PM +0200, Richard Guenther wrote: >> But isn't it a fact that it _cannot_ modify init_count? If the second call >> is CSEable then it cannot have side-effects that are observable at >> the call site. Is the following an example you would consider to fall >> under your CSEing? >> >> int init_count; >> int data; >> int initialized; >> >> void init() >> { >> if (!initialized) >> { >> data = ++init_count; >> initialized = 1; >> } >> } >> >> inline int *get_me() __attribute ((pure)); >> inline int *get_me() >> { >> init (); >> return &data; >> } >> >> int sink; >> >> int main() >> { >> sink = init_count; >> int *p = get_me(); >> if (init_count != 1) >> __builtin_abort(); >> initialized = 0; >> int *q = get_me(); >> if (init_count != 2) >> __builtin_abort(); >> return *p + *q; >> } > > The above isn't a singleton function, as you are clearing initialized, > therefore it doesn't have the properties the thread_local var get_address > wrapper has. The singleton function really is > void singleton (void) > { > static __thread bool initialized; > if (!initialized) { > initialized = true; > call_some_function_that_may_modify_memory (); > } > } > and has side effects just first time in a thread. So I suppose the testcase that would be "valid" but break with using pure would be instead int init_count; int data; void init() { static int initialized; if (!initialized) { data = ++init_count; initialized = 1; } } inline int *get_me() __attribute ((pure)); inline int *get_me() { init (); return &data; } int main() { int x = init_count; int *p = get_me(); if (init_count == x) __builtin_abort(); int *q = get_me(); if (init_count == x) __builtin_abort(); } here when get_me is pure we CSE init_count over the _first_ call of get_me. Technically we only might CSE it over the second call of get_me. Thus there is flow-sensitive information coming out of nowhere when we try to skip the second get_me () call in looking up the init_count load after it (coming out of nowhere for the alias-oracle query which only includes 'init_count' and the stmt 'int *q = get_me ()'). Thus my answer is, for the alias-oracle the hypothetical 'singleton_init' attribute provides no useful information. Thus CSE and/or DCE of 'singleton_init' calls has to use special code (which I can think of being added to DCE, harder to existing SCCVN, maybe easier to DOM) which will usually not fit into the algorithm (so you might be able to teach FRE elimination of how to elminiate the 2nd call to get_me () but you will need a 2nd FRE pass to then figure out that init_count can be CSEd as well - which makes this all very undesirable). Inlining will also make this harder (you lose the attribute and fall into the lucas situation ...). At some point this "singleton"-ness should be auto-detected (there is inlined code in SPEC 2k lucas that would benefit from constant propagation, and SPEC 2k6 libquantum that uses a related concept - change / restore of a global in a function to the net effect that the function should not be considered clobbering it). Richard. > Jakub
Re: Functions that are CSEable but not pure
On 10/04/2012 01:15 PM, Richard Guenther wrote: That is, I am confused about the distinction you seem to make between the static variable 'initialized' and the global 'init_count'. The distinction is that "initialized" is an implementation detail that once set is never cleared; it is the mechanism by which only the first call has side effects. init_count is a normal user variable. Jason
Re: Functions that are CSEable but not pure
On Thu, Oct 04, 2012 at 07:08:02PM +0200, Richard Guenther wrote: > But isn't it a fact that it _cannot_ modify init_count? If the second call > is CSEable then it cannot have side-effects that are observable at > the call site. Is the following an example you would consider to fall > under your CSEing? > > int init_count; > int data; > int initialized; > > void init() > { > if (!initialized) > { > data = ++init_count; > initialized = 1; > } > } > > inline int *get_me() __attribute ((pure)); > inline int *get_me() > { > init (); > return &data; > } > > int sink; > > int main() > { > sink = init_count; > int *p = get_me(); > if (init_count != 1) > __builtin_abort(); > initialized = 0; > int *q = get_me(); > if (init_count != 2) > __builtin_abort(); > return *p + *q; > } The above isn't a singleton function, as you are clearing initialized, therefore it doesn't have the properties the thread_local var get_address wrapper has. The singleton function really is void singleton (void) { static __thread bool initialized; if (!initialized) { initialized = true; call_some_function_that_may_modify_memory (); } } and has side effects just first time in a thread. Jakub
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 7:08 PM, Richard Guenther wrote: > On Thu, Oct 4, 2012 at 5:22 PM, Jason Merrill wrote: >> On 10/04/2012 09:07 AM, Richard Guenther wrote: >>> >>> Ugh. Especially with the above (you can DCE those calls) makes this >>> severly mis-specified ... and any implementation error-prone (look what >>> mess our losely defined 'malloc' attribute opened ...). >>> >>> I thought of a testcase like >>> >>> int *p = get_me (); >>> .. = *p; >>> int *q = get_me (); >>> .. = *q; >>> >>> and get_me allocating/initalizing and returning a singleton. >> >> >> Right. >> >> >>> But you tell me it's more complicated and get_me () needs to >>> be a barrier for any load/store (as it may modify arbitrary memory, >>> but only on the "first" call). >> >> >> Yes, because the initialization is user-written code. >> >> >>> I think that "may modify arbitrary memory" isn't going to fly and >>> my answer would be, better don't try to optimize anything here, >>> at least not in generic terms. How would you handle this in >>> the alias oracle? How would you then make CSE recognize >>> two functions return the same value and are CSEable? >> >> >> For aliasing purposes, the call is like a call to a normal function. For CSE >> purposes, we want to recognize identical calls and combine them. I don't >> know the GCC bits well enough to be any more specific. >> >> >>> Can you come up with a short but complete testcase illustrating the issue >>> better (preferably C, so I don't need to read-in magic points where >>> construction >>> happens)? >> >> >> int init_count; >> int data; >> >> void init() >> { >> static int initialized; >> if (!initialized) >> { >> data = ++init_count; >> initialized = 1; >> } >> } >> >> inline int *get_me() __attribute ((pure)); >> inline int *get_me() >> { >> init (); >> return &data; >> } >> >> int sink; >> >> int main() >> { >> sink = init_count; >> int *p = get_me(); >> if (init_count != 1) >> __builtin_abort(); >> int *q = get_me(); >> if (init_count != 1) >> __builtin_abort(); >> return *p + *q; >> } >> >> On this testcase, gcc -O2 doesn't reload init_count after the call to get_me >> because it thinks that the call can't have modified init_count. I want the >> compiler to know that it is safe to discard the redundant assignment, but >> not make assumptions about memory. > > But isn't it a fact that it _cannot_ modify init_count? If the second call > is CSEable then it cannot have side-effects that are observable at > the call site. Is the following an example you would consider to fall > under your CSEing? > > int init_count; > int data; > int initialized; > > void init() > { > if (!initialized) > { > data = ++init_count; > initialized = 1; > } > } > > inline int *get_me() __attribute ((pure)); > inline int *get_me() > { > init (); > return &data; > } > > int sink; > > int main() > { > sink = init_count; > int *p = get_me(); > if (init_count != 1) > __builtin_abort(); > initialized = 0; > int *q = get_me(); > if (init_count != 2) > __builtin_abort(); > return *p + *q; > } > > ? If so, then why can we assume get_me returns the same pointer even here? Or similar: int main() { sink = init_count; int *p = get_me(); if (init_count != 1) __builtin_abort(); } is this required to not abort? p is unused and with 'pure' you'd DCE the call. That is, I am confused about the distinction you seem to make between the static variable 'initialized' and the global 'init_count'. You seem to imply that the attribute would mean that we can CSE side-effects to global memory but that those side-effects may be value-changing! In practice for any CSE implementation this hypotetical "cse-side-effects" attribute would only allow us to CSE if there are no intermediate side-effects between two such calls, and as soon as the second get_me call would be CSEd we'd also CSE the init_count value (which you didn't want to CSE?!) Still confused ;) Richard. > Richard. > >> >> On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at >> 08:56:03AM -0400, Jason Merrill wrote: >>> Sure, but I thought you want to inline the wrapper function as soon as >>> possible. Or do you want to keep it as some kind of builtin that gets >>> expanded during expansion to the test and call? >> >> Ah, I see your point; if get_me is inlined we end up with two calls to init, >> so it would be good to mark init with the same hypothetical attribute. >> >> Jason >>
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 5:22 PM, Jason Merrill wrote: > On 10/04/2012 09:07 AM, Richard Guenther wrote: >> >> Ugh. Especially with the above (you can DCE those calls) makes this >> severly mis-specified ... and any implementation error-prone (look what >> mess our losely defined 'malloc' attribute opened ...). >> >> I thought of a testcase like >> >> int *p = get_me (); >> .. = *p; >> int *q = get_me (); >> .. = *q; >> >> and get_me allocating/initalizing and returning a singleton. > > > Right. > > >> But you tell me it's more complicated and get_me () needs to >> be a barrier for any load/store (as it may modify arbitrary memory, >> but only on the "first" call). > > > Yes, because the initialization is user-written code. > > >> I think that "may modify arbitrary memory" isn't going to fly and >> my answer would be, better don't try to optimize anything here, >> at least not in generic terms. How would you handle this in >> the alias oracle? How would you then make CSE recognize >> two functions return the same value and are CSEable? > > > For aliasing purposes, the call is like a call to a normal function. For CSE > purposes, we want to recognize identical calls and combine them. I don't > know the GCC bits well enough to be any more specific. > > >> Can you come up with a short but complete testcase illustrating the issue >> better (preferably C, so I don't need to read-in magic points where >> construction >> happens)? > > > int init_count; > int data; > > void init() > { > static int initialized; > if (!initialized) > { > data = ++init_count; > initialized = 1; > } > } > > inline int *get_me() __attribute ((pure)); > inline int *get_me() > { > init (); > return &data; > } > > int sink; > > int main() > { > sink = init_count; > int *p = get_me(); > if (init_count != 1) > __builtin_abort(); > int *q = get_me(); > if (init_count != 1) > __builtin_abort(); > return *p + *q; > } > > On this testcase, gcc -O2 doesn't reload init_count after the call to get_me > because it thinks that the call can't have modified init_count. I want the > compiler to know that it is safe to discard the redundant assignment, but > not make assumptions about memory. But isn't it a fact that it _cannot_ modify init_count? If the second call is CSEable then it cannot have side-effects that are observable at the call site. Is the following an example you would consider to fall under your CSEing? int init_count; int data; int initialized; void init() { if (!initialized) { data = ++init_count; initialized = 1; } } inline int *get_me() __attribute ((pure)); inline int *get_me() { init (); return &data; } int sink; int main() { sink = init_count; int *p = get_me(); if (init_count != 1) __builtin_abort(); initialized = 0; int *q = get_me(); if (init_count != 2) __builtin_abort(); return *p + *q; } ? If so, then why can we assume get_me returns the same pointer even here? Richard. > > On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at > 08:56:03AM -0400, Jason Merrill wrote: >> Sure, but I thought you want to inline the wrapper function as soon as >> possible. Or do you want to keep it as some kind of builtin that gets >> expanded during expansion to the test and call? > > Ah, I see your point; if get_me is inlined we end up with two calls to init, > so it would be good to mark init with the same hypothetical attribute. > > Jason >
Re: Functions that are CSEable but not pure
On 10/04/2012 09:07 AM, Richard Guenther wrote: Ugh. Especially with the above (you can DCE those calls) makes this severly mis-specified ... and any implementation error-prone (look what mess our losely defined 'malloc' attribute opened ...). I thought of a testcase like int *p = get_me (); .. = *p; int *q = get_me (); .. = *q; and get_me allocating/initalizing and returning a singleton. Right. But you tell me it's more complicated and get_me () needs to be a barrier for any load/store (as it may modify arbitrary memory, but only on the "first" call). Yes, because the initialization is user-written code. I think that "may modify arbitrary memory" isn't going to fly and my answer would be, better don't try to optimize anything here, at least not in generic terms. How would you handle this in the alias oracle? How would you then make CSE recognize two functions return the same value and are CSEable? For aliasing purposes, the call is like a call to a normal function. For CSE purposes, we want to recognize identical calls and combine them. I don't know the GCC bits well enough to be any more specific. Can you come up with a short but complete testcase illustrating the issue better (preferably C, so I don't need to read-in magic points where construction happens)? int init_count; int data; void init() { static int initialized; if (!initialized) { data = ++init_count; initialized = 1; } } inline int *get_me() __attribute ((pure)); inline int *get_me() { init (); return &data; } int sink; int main() { sink = init_count; int *p = get_me(); if (init_count != 1) __builtin_abort(); int *q = get_me(); if (init_count != 1) __builtin_abort(); return *p + *q; } On this testcase, gcc -O2 doesn't reload init_count after the call to get_me because it thinks that the call can't have modified init_count. I want the compiler to know that it is safe to discard the redundant assignment, but not make assumptions about memory. On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at 08:56:03AM -0400, Jason Merrill wrote: > Sure, but I thought you want to inline the wrapper function as soon as > possible. Or do you want to keep it as some kind of builtin that gets > expanded during expansion to the test and call? Ah, I see your point; if get_me is inlined we end up with two calls to init, so it would be good to mark init with the same hypothetical attribute. Jason
Re: Functions that are CSEable but not pure
On Thu, Oct 4, 2012 at 2:56 PM, Jason Merrill wrote: > On 10/04/2012 08:45 AM, Jakub Jelinek wrote: >> >> On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote: >>> >>> On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill wrote: >>> If the result is not needed, are we allowed to remove a call to this >>> function? >> >> >> No. Unless you know the same function has been already called. > > > Yes, we are. If we optimize away a use of the variable, we can also remove > the call to the function. We can also hoist the call so its side-effects > occur before other side-effects, as the standard only requires that the > initialization of the variable occur some time between thread creation and > the first use of the variable. > > >>> So - what's wrong with using pure? That it doesn't "feel" correct? >> >> >> No, it can modify memory. > > > Right, that's the issue. The back end assumes that calls to pure functions > won't clobber memory; calls to these functions can clobber arbitrary memory, > but only on the first call. Ugh. Especially with the above (you can DCE those calls) makes this severly mis-specified ... and any implementation error-prone (look what mess our losely defined 'malloc' attribute opened ...). I thought of a testcase like int *p = get_me (); .. = *p; int *q = get_me (); .. = *q; and get_me allocating/initalizing and returning a singleton. But you tell me it's more complicated and get_me () needs to be a barrier for any load/store (as it may modify arbitrary memory, but only on the "first" call). I think that "may modify arbitrary memory" isn't going to fly and my answer would be, better don't try to optimize anything here, at least not in generic terms. How would you handle this in the alias oracle? How would you then make CSE recognize two functions return the same value and are CSEable? >> I think the plan was for these functions not to return any value, > > > No, I'm talking about the wrapper function which returns a reference to the > variable (as in my example). Can you come up with a short but complete testcase illustrating the issue better (preferably C, so I don't need to read-in magic points where construction happens)? Thanks, Richard. > Jason >
Re: Functions that are CSEable but not pure
On Thu, Oct 04, 2012 at 08:56:03AM -0400, Jason Merrill wrote: > >I think the plan was for these functions not to return any value, > > No, I'm talking about the wrapper function which returns a reference > to the variable (as in my example). Sure, but I thought you want to inline the wrapper function as soon as possible. Or do you want to keep it as some kind of builtin that gets expanded during expansion to the test and call? Jakub
Re: Functions that are CSEable but not pure
On 10/04/2012 08:45 AM, Jakub Jelinek wrote: On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote: On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill wrote: If the result is not needed, are we allowed to remove a call to this function? No. Unless you know the same function has been already called. Yes, we are. If we optimize away a use of the variable, we can also remove the call to the function. We can also hoist the call so its side-effects occur before other side-effects, as the standard only requires that the initialization of the variable occur some time between thread creation and the first use of the variable. So - what's wrong with using pure? That it doesn't "feel" correct? No, it can modify memory. Right, that's the issue. The back end assumes that calls to pure functions won't clobber memory; calls to these functions can clobber arbitrary memory, but only on the first call. I think the plan was for these functions not to return any value, No, I'm talking about the wrapper function which returns a reference to the variable (as in my example). Jason
Re: Functions that are CSEable but not pure
On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote: > On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill wrote: > If the result is not needed, are we allowed to remove a call to this > function? No. Unless you know the same function has been already called. > So - what's wrong with using pure? That it doesn't "feel" correct? No, it can modify memory. Consider: a.h: extern int g; struct S { S () { __atomic_add_fetch (&g, 1, __ATOMIC_RELAXED); s = 7; } int s; }; extern thread_local S s; liba.C: #include "a.h" thread_local S s; libb.C: #include "a.h" int g; void foo (S *&p, S *&q, int &a, int &b) { a = g; p = &s; // this runs S::S() in the current thread b = g; q = &s; } I think the plan was for these functions not to return any value, so that they could be aliases for all thread_local vars defined in the same TU. Then you really can't DCE the first call, you can the second call. If it returns void, CSEable is probably a bad word. Even if it returns value and the value is unused, it still needs to be called the first time in a function, and can modify anything the ctor(s) can modify. Jakub
Re: Functions that are CSEable but not pure
On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill wrote: > In C++ there is a common idiom called "initialize on first use". In its > simplest form it looks like > > int& lazy_i() > { > static int i = init; > return i; > } > > If the initialization is expensive or order-sensitive, this is a useful > alternative to initialization on load > (http://www.parashift.com/c++-faq/static-init-order-on-first-use.html). > > An interesting property of such functions is that they only have > side-effects the first time they are called, so subsequent calls can be > optimized away to just use the return value of the first call. If the result is not needed, are we allowed to remove a call to this function? Basically, is the function allowed to "detect" if it was called before? If we can remove it, thus the function may not exploit knowledge on wheter it was called before then we can also perform partial redundancy elimination in if () lazy_i (); lazy_i (); so I really hope we can also DCE these calls - otherwise their property would be really weird, as CSE also involves a "DCE" piece: i = lazy_i (); j = lazy_j (); -> i = lazy_i (); lazy_j (); j = i; -> i = lazy_i (); j = i; But if we are allowed to DCE a lazy_i call with unused result then it really is "pure". Basically with this property we consider the static variable it writes to as local, as it cannot be inspected in any other way than by calling the function or inspecting its return value. So - what's wrong with using pure? That it doesn't "feel" correct? The I suppose we should simply re-word its definition ;) > Currently there is no way to express this in GCC so that the optimizers know > that multiple calls are redundant. Marking the function as pure causes > calls to be CSEd as desired, but also asserts that the first call has no > side-effects, which is not the case. > > My implementation of dynamic initialization of TLS variables as mandated by > the C++11 and OpenMP standards uses this idiom implicitly, so I'd really > like to be able to optimize away multiple calls. > > Right now we have the following ECF flags (among others): > > const = no read, no write, no side-effects, cseable > pure = no write, no side-effects, cseable > novops = no read, no write note that I consider novops a red-herring. novops is basically 'const' volatile, but we don't have a volatile flag on calls (we have it, but that's for the store to the result). I'd rather get rid of novops ... > looping_const_or_pure = wait, actually there are side-effects > > Seems like the difference between novops and looping_const_or_pure is > whether calls are subject to CSE. Is that right? No, novops and looping_const_or_pure tell you whether they are subject to DCE, all const/pure functions are subject to CSE, novops functions are not (as said above, novops is really a volatile const). > Rather than checking these flag bundles, it seems to me that we ought to > break them up into > > no reads > no writes > no side-effects > cseable side-effects no side-effects and cseable side-effects doesn't sound like a good distinction to me - side-effects are not 'cse'-able or they would not be side-effects. But I suppose it all boils down to my question above, where the PRE case is really the most interesting (can the formerly second call "detect" it used the initialized path or the non-initialized path?) > So const would be noread|nowrite|noside >pure would be nowrite|noside > const|looping would be noread|nowrite|cseside > pure|looping would be nowrite|cseside >novops would be noread|nowrite > > and the behavior I want would be just cseside. > > Does this make sense? Anyone more familiar with this area of the code want > to implement it for me? :} I'd say it's already implemented, just poorly documented. Use pure! Richard. > Jason
Re: Functions that are CSEable but not pure
On 10/03/2012 07:06 PM, Joseph S. Myers wrote: On Wed, 3 Oct 2012, Jason Merrill wrote: My implementation of dynamic initialization of TLS variables as mandated by the C++11 and OpenMP standards uses this idiom implicitly, so I'd really like to be able to optimize away multiple calls. I'd thought that dynamic initialization of TLS variables was supposed to need ELF gABI, C++ ABI, binutils and glibc support as per Jakub's comment in bug 27557 - have you found a way to avoid needing those, or are you actually discussing something else? Handling it via init-on-first-use avoids the need for ELF and binutils support. The interface needs to be specified in the C++ ABI, and I we want glibc support for strict correctness (as with __cxa_atexit), but the functionality can be implemented without it. Jason
Re: Functions that are CSEable but not pure
On Wed, 3 Oct 2012, Jason Merrill wrote: > My implementation of dynamic initialization of TLS variables as mandated by > the C++11 and OpenMP standards uses this idiom implicitly, so I'd really like > to be able to optimize away multiple calls. I'd thought that dynamic initialization of TLS variables was supposed to need ELF gABI, C++ ABI, binutils and glibc support as per Jakub's comment in bug 27557 - have you found a way to avoid needing those, or are you actually discussing something else? -- Joseph S. Myers jos...@codesourcery.com