Re: Functions that are CSEable but not pure

2012-10-12 Thread Joseph S. Myers
On Fri, 12 Oct 2012, Jason Merrill wrote:

> > Consider instead the following expansion:
> > 
> > __thread void (*i_initialize)(void) = i_init;
> 
> That would work too, though it wouldn't have the benefit of link-compatibility
> with __thread (for variables that don't need dynamic initialization) that the
> current scheme has.

I'd certainly think that for C, __thread and _Thread_local should be 
link-compatible (and probably should be treated more or less identically 
apart from some diagnostics), and that C11 _Thread_local should be 
link-compatible (in the absence of dynamic initialization) with C++11 
thread_local.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Functions that are CSEable but not pure

2012-10-12 Thread Jason Merrill

On 10/11/2012 03:48 PM, Alexandre Oliva wrote:

This is not entirely clear to me; where is i defined in the first place?


Um, it's defined where the user writes the definition, like any other 
variable.



Is it i_init that defines and tests i_initialized?


Yes.


Consider instead the following expansion:

__thread void (*i_initialize)(void) = i_init;


That would work too, though it wouldn't have the benefit of 
link-compatibility with __thread (for variables that don't need dynamic 
initialization) that the current scheme has.


I don't know whether it would be more or less efficient.


or this:

[EXTERN] __thread T *i;


Same answer.  If we aren't trying to maintain compatibility, this 
approach seems like it would be more efficient than i_initialize.


Jason



Re: Functions that are CSEable but not pure

2012-10-11 Thread Alexandre Oliva
On Oct 11, 2012, Jason Merrill  wrote:

> On 10/11/2012 11:14 AM, Alexandre Oliva wrote:
>> How about marking the singleton containing the call to the initializer
>> as always_inline, but not the initializer itself?
>> 
>> The compiler can then infer that initialized is set on the first inlined
>> call and optimize away subsequent tests and initializer calls
>> (call_some_function_that_may_modify_memory).

> That would require exporting the initialized flag in addition to the
> initializer function; currently it is private to the translation unit
> with the initializer function.  That is, the wrapper currently looks
> like

> int& i_wrap() { if (i_init) i_init(); return i; }

This is not entirely clear to me; where is i defined in the first place?
Is it i_init that defines and tests i_initialized?

I.e., do we translate:

[EXTERN] thread_local T i;

int f()
{
  T& j = i;
  ...
}

to:

[EXTERN] __thread T i;

#if !EXTERN
void
i_init()
{
  static __thread bool initialized;
  if (!initialized)
{
  initialized = true;
  i.T();
}
}
#endif

static T&
i_wrap ()
{
  if (i_init)
i_init ();
  return i;
}

int
f()
{
  T& j = i_wrap();
  ...
}

?


Consider instead the following expansion:

[EXTERN] __thread T i;

#if !EXTERN
static void i_init (void);

__thread void (*i_initialize)(void) = i_init;

static void
i_init ()
{
  i_initialize = NULL;
  i.T();
}
#else
EXTERN __thread void (*i_initialize)(void);
#endif

static T& __attribute__((__always_inline__))
i_wrap ()
{
  if (i_initialize)
{
  i_initialize ();
  // this is redundant, but it should enable removal of subsequent calls
  i_initialize = NULL;
}
  return i;
}

or this:

[EXTERN] __thread T *i;

#if !EXTERN
static __thread T i_data;

void
i_init ()
{
  i = &i_data; // or maybe allocate it dynamically?
  i->T();
}
#else
extern void i_init ();
#endif

static T& __attribute__((__always_inline__))
i_wrap ()
{
  if (!i)
i_init ();
  return *i; // dereference should enable cse
}

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


Re: Functions that are CSEable but not pure

2012-10-11 Thread Jason Merrill

On 10/11/2012 11:14 AM, Alexandre Oliva wrote:

How about marking the singleton containing the call to the initializer
as always_inline, but not the initializer itself?

The compiler can then infer that initialized is set on the first inlined
call and optimize away subsequent tests and initializer calls
(call_some_function_that_may_modify_memory).


That would require exporting the initialized flag in addition to the 
initializer function; currently it is private to the translation unit 
with the initializer function.  That is, the wrapper currently looks like


int& i_wrap()
{
  if (i_init) i_init();
  return i;
}

and your suggestion would change it to

int& i_wrap()
{
  if (i_init && !i_initialized) { i_initialized = true; i_init(); }
  return i;
}

In previous discussions, people thought this would be less efficient.

Jason



Re: Functions that are CSEable but not pure

2012-10-11 Thread Alexandre Oliva
On Oct  3, 2012, Jason Merrill  wrote:

> int& lazy_i()
> {
>   static int i = init;
>   return i;
> }

> If the initialization is expensive or order-sensitive, this is a
> useful alternative to initialization on load

> An interesting property of such functions is that they only have
> side-effects the first time they are called, so subsequent calls can
> be optimized away to just use the return value of the first call.

> Currently there is no way to express this in GCC so that the
> optimizers know that multiple calls are redundant.


On Oct  4, 2012, Jakub Jelinek  wrote:

> The singleton function really is

> void singleton (void)
> {
>   static __thread bool initialized;
>   if (!initialized) {
> initialized = true;
> call_some_function_that_may_modify_memory ();
>   }
> }

> and has side effects just first time in a thread.

How about marking the singleton containing the call to the initializer
as always_inline, but not the initializer itself?

The compiler can then infer that initialized is set on the first inlined
call and optimize away subsequent tests and initializer calls
(call_some_function_that_may_modify_memory).

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


Re: Functions that are CSEable but not pure

2012-10-05 Thread Jason Merrill

On 10/05/2012 04:20 PM, Xinliang David Li wrote:

So init () will be unconditionally called on entry to the wrapper?

In that case, you should probably go to the other extreme -- mark the
wrapper as noinline and expanded as if they are builtins -- but run
into the tricky issues of 'attributes' as discussed here.


I think we can leave the wrapper as a normal inline and use the same 
attribute on the init function; it also has the property of only having 
side-effects on the first call, it just doesn't return anything.


Jason



Re: Functions that are CSEable but not pure

2012-10-05 Thread Xinliang David Li
On Fri, Oct 5, 2012 at 1:12 PM, Jason Merrill  wrote:
> On 10/05/2012 03:46 PM, Xinliang David Li wrote:
>>
>> 1) As mentioned in this thread -- when the wrapper is inlined, the
>> pure attribute will be lost. It will give compiler hard time to
>> optimize away the guard code
>
>
> The wrapper doesn't actually check the guard; that happens in the init
> function, which is not inline.
>

So init () will be unconditionally called on entry to the wrapper?

In that case, you should probably go to the other extreme -- mark the
wrapper as noinline and expanded as if they are builtins -- but run
into the tricky issues of 'attributes' as discussed here.

David


> Jason
>


Re: Functions that are CSEable but not pure

2012-10-05 Thread Jason Merrill

On 10/05/2012 03:46 PM, Xinliang David Li wrote:

1) As mentioned in this thread -- when the wrapper is inlined, the
pure attribute will be lost. It will give compiler hard time to
optimize away the guard code


The wrapper doesn't actually check the guard; that happens in the init 
function, which is not inline.


Jason



Re: Functions that are CSEable but not pure

2012-10-05 Thread Xinliang David Li
The semantics of the 'first' reference of the TLS variable has
changed, and this change is introduced by the implementation. It is no
longer safe to do DCE as

thread_local int x = ..;

int foo()
{
   x;
  ...
}

but should be safe to do DSE as

int foo()
{
   x  = ...;  // Dead
   x = ...;
  ..
}

as well as CSE, PRE, PDE etc.


The bigger problems are

1) As mentioned in this thread -- when the wrapper is inlined, the
pure attribute will be lost. It will give compiler hard time to
optimize away the guard code
2) The runtime overhead of the guard code -- if the guard variable is also tls.

Speaking of 1), given two inline instances of the wrapper,

if (!init_guard)
   {
  init_guard = 1;
  do_init();
   }

 ... some code

if (!init_guard)
  {
init_guard =1 ;
do_init();
  }

If the compiler is taught that init_guard can not be clobbered by
anything other than the owner TLS variable's wrapper function, then
VRP should should be able to provide init_guard is not zero before the
second check thus can completely eliminate the if-block.
Inter-procedural elimination requires more work though.

so perhaps the solution is just to mark wrapper function always
inline? It will punish the code size for O0 build though as the init
block can not be optimized away.

thanks,

David



On Fri, Oct 5, 2012 at 10:24 AM, Jason Merrill  wrote:
> On 10/05/2012 04:46 AM, Richard Guenther wrote:
>>
>> Ok, so then let me make another example to try breaking a valid program
>> with the thread_local wrapper being pure:
>
>
> There is also my example earlier in the thread.
>
>
>>i;
>>if (init_count != 1)
>>  __builtin_abort ();
>>
>> is that valid?
>
>
> I believe so; that is an odr-use of i, so it needs to be initialized.
>
>
>> Thus my question is - is a valid program allowed to access side-effects
>> of the dynamic TLS initializer by not going through the TLS variable?
>
>
> I think so.
>
>
>> Preventing DCE but not CSE for const/pure functions can be for
>> example done by using the looping-const-or-pure flag (but that
>> also tells the compiler that this function may not return, so it is
>> very specific about the kind of possible side-effect to be preserved).
>> The very specific nature of thread_local TLS init semantics
>> ('somewhen before') is hard to make use of, so if we want to
>> change looping-const-or-pure to something like
>> const-or-pure-with-side-effects
>> we should constrain things more.
>
>
> That would be fine with me.
>
> Jason
>


Re: Functions that are CSEable but not pure

2012-10-05 Thread Jason Merrill

On 10/05/2012 04:46 AM, Richard Guenther wrote:

Ok, so then let me make another example to try breaking a valid program
with the thread_local wrapper being pure:


There is also my example earlier in the thread.


   i;
   if (init_count != 1)
 __builtin_abort ();

is that valid?


I believe so; that is an odr-use of i, so it needs to be initialized.


Thus my question is - is a valid program allowed to access side-effects
of the dynamic TLS initializer by not going through the TLS variable?


I think so.


Preventing DCE but not CSE for const/pure functions can be for
example done by using the looping-const-or-pure flag (but that
also tells the compiler that this function may not return, so it is
very specific about the kind of possible side-effect to be preserved).
The very specific nature of thread_local TLS init semantics
('somewhen before') is hard to make use of, so if we want to
change looping-const-or-pure to something like const-or-pure-with-side-effects
we should constrain things more.


That would be fine with me.

Jason



Re: Functions that are CSEable but not pure

2012-10-05 Thread Richard Guenther
On Thu, Oct 4, 2012 at 8:02 PM, Jason Merrill  wrote:
> On 10/04/2012 01:42 PM, Richard Guenther wrote:
>>
>> So I suppose the testcase that would be "valid" but break with using
>> pure would be instead
>>
>> int main()
>> {
>>int x = init_count;
>>int *p = get_me();
>>if (init_count == x)
>>  __builtin_abort();
>>int *q = get_me();
>>if (init_count == x)
>>  __builtin_abort();
>> }
>>
>> here when get_me is pure we CSE init_count over the _first_ call of
>> get_me.
>
>
> That's OK for C++ thread_local semantics; the initialization is specified to
> happen some time before the first use, so the testcase is making an invalid
> assumption.  This might not be desirable for user-written singleton
> functions, though.

Ok, so then let me make another example to try breaking a valid program
with the thread_local wrapper being pure:

int main()
{
  int &p = get_me();
  *p;
  if (init_count == 1)
__builtin_abort();
}

note my using of C++ references (which in this case cannot bind to NULL
and thus *p might not trap(?)).  So the C++ source would be something like

thread_local int init_count;
int foo () { init_count = 1; return 0; }
thread_local int i = foo ();

int main()
{
  i;
  if (init_count != 1)
__builtin_abort ();
}

is that valid?  When lowered to the above we would DCE the load of i
and the call to the initialization wrapper (because it is pure).  Obviously
then init_count is no longer 1.

Thus my question is - is a valid program allowed to access side-effects
of the dynamic TLS initializer by not going through the TLS variable?
I realize that's somewhat ill-formed if the TLS variable was a class with
a static method doing the access to init_count - using that method
would not keep the class instance life.

Preventing DCE but not CSE for const/pure functions can be for
example done by using the looping-const-or-pure flag (but that
also tells the compiler that this function may not return, so it is
very specific about the kind of possible side-effect to be preserved).
The very specific nature of thread_local TLS init semantics
('somewhen before') is hard to make use of, so if we want to
change looping-const-or-pure to something like const-or-pure-with-side-effects
we should constrain things more.

Richard.

> Jason
>


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jason Merrill

On 10/04/2012 01:42 PM, Richard Guenther wrote:

So I suppose the testcase that would be "valid" but break with using
pure would be instead

int main()
{
   int x = init_count;
   int *p = get_me();
   if (init_count == x)
 __builtin_abort();
   int *q = get_me();
   if (init_count == x)
 __builtin_abort();
}

here when get_me is pure we CSE init_count over the _first_ call of get_me.


That's OK for C++ thread_local semantics; the initialization is 
specified to happen some time before the first use, so the testcase is 
making an invalid assumption.  This might not be desirable for 
user-written singleton functions, though.


Jason



Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 7:15 PM, Jakub Jelinek  wrote:
> On Thu, Oct 04, 2012 at 07:08:02PM +0200, Richard Guenther wrote:
>> But isn't it a fact that it _cannot_ modify init_count?  If the second call
>> is CSEable then it cannot have side-effects that are observable at
>> the call site.  Is the following an example you would consider to fall
>> under your CSEing?
>>
>> int init_count;
>> int data;
>> int initialized;
>>
>> void init()
>> {
>>   if (!initialized)
>> {
>>   data = ++init_count;
>>   initialized = 1;
>> }
>> }
>>
>> inline int *get_me() __attribute ((pure));
>> inline int *get_me()
>> {
>>   init ();
>>   return &data;
>> }
>>
>> int sink;
>>
>> int main()
>> {
>>   sink = init_count;
>>   int *p = get_me();
>>   if (init_count != 1)
>> __builtin_abort();
>>   initialized = 0;
>>   int *q = get_me();
>>   if (init_count != 2)
>> __builtin_abort();
>>   return *p + *q;
>> }
>
> The above isn't a singleton function, as you are clearing initialized,
> therefore it doesn't have the properties the thread_local var get_address
> wrapper has.  The singleton function really is
> void singleton (void)
> {
>   static __thread bool initialized;
>   if (!initialized) {
> initialized = true;
> call_some_function_that_may_modify_memory ();
>   }
> }
> and has side effects just first time in a thread.

So I suppose the testcase that would be "valid" but break with using
pure would be instead

int init_count;
int data;

void init()
{
  static int initialized;
  if (!initialized)
{
  data = ++init_count;
  initialized = 1;
}
}

inline int *get_me() __attribute ((pure));
inline int *get_me()
{
  init ();
  return &data;
}

int main()
{
  int x = init_count;
  int *p = get_me();
  if (init_count == x)
__builtin_abort();
  int *q = get_me();
  if (init_count == x)
__builtin_abort();
}

here when get_me is pure we CSE init_count over the _first_ call of get_me.
Technically we only might CSE it over the second call of get_me.  Thus there
is flow-sensitive information coming out of nowhere when we try to skip
the second get_me () call in looking up the init_count load after it (coming
out of nowhere for the alias-oracle query which only includes 'init_count'
and the stmt 'int *q = get_me ()').

Thus my answer is, for the alias-oracle the hypothetical 'singleton_init'
attribute provides no useful information.  Thus CSE and/or DCE of
'singleton_init'
calls has to use special code (which I can think of being added to DCE, harder
to existing SCCVN, maybe easier to DOM) which will usually not fit into
the algorithm (so you might be able to teach FRE elimination of how to
elminiate the 2nd call to get_me () but you will need a 2nd FRE pass to
then figure out that init_count can be CSEd as well - which makes this
all very undesirable).  Inlining will also make this harder (you lose
the attribute and fall into the lucas situation ...).

At some point this "singleton"-ness should be auto-detected (there is
inlined code in SPEC 2k lucas that would benefit from constant propagation,
and SPEC 2k6 libquantum that uses a related concept - change / restore
of a global in a function to the net effect that the function should not be
considered clobbering it).

Richard.

> Jakub


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jason Merrill

On 10/04/2012 01:15 PM, Richard Guenther wrote:

That is, I am confused about the distinction you seem to make between
the static variable 'initialized' and the global 'init_count'.


The distinction is that "initialized" is an implementation detail that 
once set is never cleared; it is the mechanism by which only the first 
call has side effects.


init_count is a normal user variable.

Jason



Re: Functions that are CSEable but not pure

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 07:08:02PM +0200, Richard Guenther wrote:
> But isn't it a fact that it _cannot_ modify init_count?  If the second call
> is CSEable then it cannot have side-effects that are observable at
> the call site.  Is the following an example you would consider to fall
> under your CSEing?
> 
> int init_count;
> int data;
> int initialized;
> 
> void init()
> {
>   if (!initialized)
> {
>   data = ++init_count;
>   initialized = 1;
> }
> }
> 
> inline int *get_me() __attribute ((pure));
> inline int *get_me()
> {
>   init ();
>   return &data;
> }
> 
> int sink;
> 
> int main()
> {
>   sink = init_count;
>   int *p = get_me();
>   if (init_count != 1)
> __builtin_abort();
>   initialized = 0;
>   int *q = get_me();
>   if (init_count != 2)
> __builtin_abort();
>   return *p + *q;
> }

The above isn't a singleton function, as you are clearing initialized,
therefore it doesn't have the properties the thread_local var get_address
wrapper has.  The singleton function really is
void singleton (void)
{
  static __thread bool initialized;
  if (!initialized) {
initialized = true;
call_some_function_that_may_modify_memory ();
  }
}
and has side effects just first time in a thread.

Jakub


Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 7:08 PM, Richard Guenther
 wrote:
> On Thu, Oct 4, 2012 at 5:22 PM, Jason Merrill  wrote:
>> On 10/04/2012 09:07 AM, Richard Guenther wrote:
>>>
>>> Ugh.  Especially with the above (you can DCE those calls) makes this
>>> severly mis-specified ... and any implementation error-prone (look what
>>> mess our losely defined 'malloc' attribute opened ...).
>>>
>>> I thought of a testcase like
>>>
>>>   int *p = get_me ();
>>>   .. = *p;
>>>   int *q = get_me ();
>>>   .. = *q;
>>>
>>> and get_me allocating/initalizing and returning a singleton.
>>
>>
>> Right.
>>
>>
>>> But you tell me it's more complicated and get_me () needs to
>>> be a barrier for any load/store (as it may modify arbitrary memory,
>>> but only on the "first" call).
>>
>>
>> Yes, because the initialization is user-written code.
>>
>>
>>> I think that "may modify arbitrary memory" isn't going to fly and
>>> my answer would be, better don't try to optimize anything here,
>>> at least not in generic terms.  How would you handle this in
>>> the alias oracle?  How would you then make CSE recognize
>>> two functions return the same value and are CSEable?
>>
>>
>> For aliasing purposes, the call is like a call to a normal function. For CSE
>> purposes, we want to recognize identical calls and combine them.  I don't
>> know the GCC bits well enough to be any more specific.
>>
>>
>>> Can you come up with a short but complete testcase illustrating the issue
>>> better (preferably C, so I don't need to read-in magic points where
>>> construction
>>> happens)?
>>
>>
>> int init_count;
>> int data;
>>
>> void init()
>> {
>>   static int initialized;
>>   if (!initialized)
>> {
>>   data = ++init_count;
>>   initialized = 1;
>> }
>> }
>>
>> inline int *get_me() __attribute ((pure));
>> inline int *get_me()
>> {
>>   init ();
>>   return &data;
>> }
>>
>> int sink;
>>
>> int main()
>> {
>>   sink = init_count;
>>   int *p = get_me();
>>   if (init_count != 1)
>> __builtin_abort();
>>   int *q = get_me();
>>   if (init_count != 1)
>> __builtin_abort();
>>   return *p + *q;
>> }
>>
>> On this testcase, gcc -O2 doesn't reload init_count after the call to get_me
>> because it thinks that the call can't have modified init_count. I want the
>> compiler to know that it is safe to discard the redundant assignment, but
>> not make assumptions about memory.
>
> But isn't it a fact that it _cannot_ modify init_count?  If the second call
> is CSEable then it cannot have side-effects that are observable at
> the call site.  Is the following an example you would consider to fall
> under your CSEing?
>
> int init_count;
> int data;
> int initialized;
>
> void init()
> {
>   if (!initialized)
> {
>   data = ++init_count;
>   initialized = 1;
> }
> }
>
> inline int *get_me() __attribute ((pure));
> inline int *get_me()
> {
>   init ();
>   return &data;
> }
>
> int sink;
>
> int main()
> {
>   sink = init_count;
>   int *p = get_me();
>   if (init_count != 1)
> __builtin_abort();
>   initialized = 0;
>   int *q = get_me();
>   if (init_count != 2)
> __builtin_abort();
>   return *p + *q;
> }
>
> ?  If so, then why can we assume get_me returns the same pointer even here?

Or similar:

int main()
{
  sink = init_count;
  int *p = get_me();
  if (init_count != 1)
__builtin_abort();
}

is this required to not abort?  p is unused and with 'pure' you'd DCE the call.

That is, I am confused about the distinction you seem to make between
the static variable 'initialized' and the global 'init_count'.  You seem to
imply that the attribute would mean that we can CSE side-effects to
global memory but that those side-effects may be value-changing!

In practice for any CSE implementation this hypotetical "cse-side-effects"
attribute would only allow us to CSE if there are no intermediate
side-effects between two such calls, and as soon as the second get_me
call would be CSEd we'd also CSE the init_count value (which you
didn't want to CSE?!)

Still confused ;)

Richard.

> Richard.
>
>>
>> On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at
>> 08:56:03AM -0400, Jason Merrill wrote:
>>> Sure, but I thought you want to inline the wrapper function as soon as
>>> possible.  Or do you want to keep it as some kind of builtin that gets
>>> expanded during expansion to the test and call?
>>
>> Ah, I see your point; if get_me is inlined we end up with two calls to init,
>> so it would be good to mark init with the same hypothetical attribute.
>>
>> Jason
>>


Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 5:22 PM, Jason Merrill  wrote:
> On 10/04/2012 09:07 AM, Richard Guenther wrote:
>>
>> Ugh.  Especially with the above (you can DCE those calls) makes this
>> severly mis-specified ... and any implementation error-prone (look what
>> mess our losely defined 'malloc' attribute opened ...).
>>
>> I thought of a testcase like
>>
>>   int *p = get_me ();
>>   .. = *p;
>>   int *q = get_me ();
>>   .. = *q;
>>
>> and get_me allocating/initalizing and returning a singleton.
>
>
> Right.
>
>
>> But you tell me it's more complicated and get_me () needs to
>> be a barrier for any load/store (as it may modify arbitrary memory,
>> but only on the "first" call).
>
>
> Yes, because the initialization is user-written code.
>
>
>> I think that "may modify arbitrary memory" isn't going to fly and
>> my answer would be, better don't try to optimize anything here,
>> at least not in generic terms.  How would you handle this in
>> the alias oracle?  How would you then make CSE recognize
>> two functions return the same value and are CSEable?
>
>
> For aliasing purposes, the call is like a call to a normal function. For CSE
> purposes, we want to recognize identical calls and combine them.  I don't
> know the GCC bits well enough to be any more specific.
>
>
>> Can you come up with a short but complete testcase illustrating the issue
>> better (preferably C, so I don't need to read-in magic points where
>> construction
>> happens)?
>
>
> int init_count;
> int data;
>
> void init()
> {
>   static int initialized;
>   if (!initialized)
> {
>   data = ++init_count;
>   initialized = 1;
> }
> }
>
> inline int *get_me() __attribute ((pure));
> inline int *get_me()
> {
>   init ();
>   return &data;
> }
>
> int sink;
>
> int main()
> {
>   sink = init_count;
>   int *p = get_me();
>   if (init_count != 1)
> __builtin_abort();
>   int *q = get_me();
>   if (init_count != 1)
> __builtin_abort();
>   return *p + *q;
> }
>
> On this testcase, gcc -O2 doesn't reload init_count after the call to get_me
> because it thinks that the call can't have modified init_count. I want the
> compiler to know that it is safe to discard the redundant assignment, but
> not make assumptions about memory.

But isn't it a fact that it _cannot_ modify init_count?  If the second call
is CSEable then it cannot have side-effects that are observable at
the call site.  Is the following an example you would consider to fall
under your CSEing?

int init_count;
int data;
int initialized;

void init()
{
  if (!initialized)
{
  data = ++init_count;
  initialized = 1;
}
}

inline int *get_me() __attribute ((pure));
inline int *get_me()
{
  init ();
  return &data;
}

int sink;

int main()
{
  sink = init_count;
  int *p = get_me();
  if (init_count != 1)
__builtin_abort();
  initialized = 0;
  int *q = get_me();
  if (init_count != 2)
__builtin_abort();
  return *p + *q;
}

?  If so, then why can we assume get_me returns the same pointer even here?

Richard.

>
> On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at
> 08:56:03AM -0400, Jason Merrill wrote:
>> Sure, but I thought you want to inline the wrapper function as soon as
>> possible.  Or do you want to keep it as some kind of builtin that gets
>> expanded during expansion to the test and call?
>
> Ah, I see your point; if get_me is inlined we end up with two calls to init,
> so it would be good to mark init with the same hypothetical attribute.
>
> Jason
>


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jason Merrill

On 10/04/2012 09:07 AM, Richard Guenther wrote:

Ugh.  Especially with the above (you can DCE those calls) makes this
severly mis-specified ... and any implementation error-prone (look what
mess our losely defined 'malloc' attribute opened ...).

I thought of a testcase like

  int *p = get_me ();
  .. = *p;
  int *q = get_me ();
  .. = *q;

and get_me allocating/initalizing and returning a singleton.


Right.


But you tell me it's more complicated and get_me () needs to
be a barrier for any load/store (as it may modify arbitrary memory,
but only on the "first" call).


Yes, because the initialization is user-written code.


I think that "may modify arbitrary memory" isn't going to fly and
my answer would be, better don't try to optimize anything here,
at least not in generic terms.  How would you handle this in
the alias oracle?  How would you then make CSE recognize
two functions return the same value and are CSEable?


For aliasing purposes, the call is like a call to a normal function. 
For CSE purposes, we want to recognize identical calls and combine them. 
 I don't know the GCC bits well enough to be any more specific.



Can you come up with a short but complete testcase illustrating the issue
better (preferably C, so I don't need to read-in magic points where construction
happens)?


int init_count;
int data;

void init()
{
  static int initialized;
  if (!initialized)
{
  data = ++init_count;
  initialized = 1;
}
}

inline int *get_me() __attribute ((pure));
inline int *get_me()
{
  init ();
  return &data;
}

int sink;

int main()
{
  sink = init_count;
  int *p = get_me();
  if (init_count != 1)
__builtin_abort();
  int *q = get_me();
  if (init_count != 1)
__builtin_abort();
  return *p + *q;
}

On this testcase, gcc -O2 doesn't reload init_count after the call to 
get_me because it thinks that the call can't have modified init_count. 
I want the compiler to know that it is safe to discard the redundant 
assignment, but not make assumptions about memory.


On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at 
08:56:03AM -0400, Jason Merrill wrote:

> Sure, but I thought you want to inline the wrapper function as soon as
> possible.  Or do you want to keep it as some kind of builtin that gets
> expanded during expansion to the test and call?

Ah, I see your point; if get_me is inlined we end up with two calls to 
init, so it would be good to mark init with the same hypothetical attribute.


Jason



Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 2:56 PM, Jason Merrill  wrote:
> On 10/04/2012 08:45 AM, Jakub Jelinek wrote:
>>
>> On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote:
>>>
>>> On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill  wrote:
>>> If the result is not needed, are we allowed to remove a call to this
>>> function?
>>
>>
>> No.  Unless you know the same function has been already called.
>
>
> Yes, we are.  If we optimize away a use of the variable, we can also remove
> the call to the function.  We can also hoist the call so its side-effects
> occur before other side-effects, as the standard only requires that the
> initialization of the variable occur some time between thread creation and
> the first use of the variable.
>
>
>>> So - what's wrong with using pure?  That it doesn't "feel" correct?
>>
>>
>> No, it can modify memory.
>
>
> Right, that's the issue.  The back end assumes that calls to pure functions
> won't clobber memory; calls to these functions can clobber arbitrary memory,
> but only on the first call.

Ugh.  Especially with the above (you can DCE those calls) makes this
severly mis-specified ... and any implementation error-prone (look what
mess our losely defined 'malloc' attribute opened ...).

I thought of a testcase like

 int *p = get_me ();
 .. = *p;
 int *q = get_me ();
 .. = *q;

and get_me allocating/initalizing and returning a singleton.

But you tell me it's more complicated and get_me () needs to
be a barrier for any load/store (as it may modify arbitrary memory,
but only on the "first" call).

I think that "may modify arbitrary memory" isn't going to fly and
my answer would be, better don't try to optimize anything here,
at least not in generic terms.  How would you handle this in
the alias oracle?  How would you then make CSE recognize
two functions return the same value and are CSEable?

>> I think the plan was for these functions not to return any value,
>
>
> No, I'm talking about the wrapper function which returns a reference to the
> variable (as in my example).

Can you come up with a short but complete testcase illustrating the issue
better (preferably C, so I don't need to read-in magic points where construction
happens)?

Thanks,
Richard.

> Jason
>


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 08:56:03AM -0400, Jason Merrill wrote:
> >I think the plan was for these functions not to return any value,
> 
> No, I'm talking about the wrapper function which returns a reference
> to the variable (as in my example).

Sure, but I thought you want to inline the wrapper function as soon as
possible.  Or do you want to keep it as some kind of builtin that gets
expanded during expansion to the test and call?

Jakub


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jason Merrill

On 10/04/2012 08:45 AM, Jakub Jelinek wrote:

On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote:

On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill  wrote:
If the result is not needed, are we allowed to remove a call to this
function?


No.  Unless you know the same function has been already called.


Yes, we are.  If we optimize away a use of the variable, we can also 
remove the call to the function.  We can also hoist the call so its 
side-effects occur before other side-effects, as the standard only 
requires that the initialization of the variable occur some time between 
thread creation and the first use of the variable.



So - what's wrong with using pure?  That it doesn't "feel" correct?


No, it can modify memory.


Right, that's the issue.  The back end assumes that calls to pure 
functions won't clobber memory; calls to these functions can clobber 
arbitrary memory, but only on the first call.



I think the plan was for these functions not to return any value,


No, I'm talking about the wrapper function which returns a reference to 
the variable (as in my example).


Jason



Re: Functions that are CSEable but not pure

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote:
> On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill  wrote:
> If the result is not needed, are we allowed to remove a call to this
> function?

No.  Unless you know the same function has been already called.

> So - what's wrong with using pure?  That it doesn't "feel" correct?

No, it can modify memory.
Consider:

a.h:
extern int g;
struct S
{
  S () { __atomic_add_fetch (&g, 1, __ATOMIC_RELAXED); s = 7; }
  int s;
};
extern thread_local S s;

liba.C:
#include "a.h"
thread_local S s;

libb.C:
#include "a.h"

int g;

void
foo (S *&p, S *&q, int &a, int &b)
{
  a = g;
  p = &s; // this runs S::S() in the current thread
  b = g;
  q = &s;
}

I think the plan was for these functions not to return any value,
so that they could be aliases for all thread_local vars defined in the same
TU.  Then you really can't DCE the first call, you can the second call.
If it returns void, CSEable is probably a bad word.
Even if it returns value and the value is unused, it still needs to be
called the first time in a function, and can modify anything the ctor(s)
can modify.

Jakub


Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill  wrote:
> In C++ there is a common idiom called "initialize on first use".  In its
> simplest form it looks like
>
> int& lazy_i()
> {
>   static int i = init;
>   return i;
> }
>
> If the initialization is expensive or order-sensitive, this is a useful
> alternative to initialization on load
> (http://www.parashift.com/c++-faq/static-init-order-on-first-use.html).
>
> An interesting property of such functions is that they only have
> side-effects the first time they are called, so subsequent calls can be
> optimized away to just use the return value of the first call.

If the result is not needed, are we allowed to remove a call to this
function?  Basically, is the function allowed to "detect" if it was
called before?
If we can remove it, thus the function may not exploit knowledge on wheter
it was called before then we can also perform partial redundancy elimination in

 if ()
   lazy_i ();
 lazy_i ();

so I really hope we can also DCE these calls - otherwise their property would
be really weird, as CSE also involves a "DCE" piece:

 i = lazy_i ();
j = lazy_j ();

->
  i = lazy_i ();
  lazy_j ();
  j = i;

->

 i = lazy_i ();
 j = i;

But if we are allowed to DCE a lazy_i call with unused result then it really is
"pure".  Basically with this property we consider the static variable
it writes to
as local, as it cannot be inspected in any other way than by calling the
function or inspecting its return value.

So - what's wrong with using pure?  That it doesn't "feel" correct?
The I suppose
we should simply re-word its definition ;)

> Currently there is no way to express this in GCC so that the optimizers know
> that multiple calls are redundant.  Marking the function as pure causes
> calls to be CSEd as desired, but also asserts that the first call has no
> side-effects, which is not the case.
>
> My implementation of dynamic initialization of TLS variables as mandated by
> the C++11 and OpenMP standards uses this idiom implicitly, so I'd really
> like to be able to optimize away multiple calls.
>
> Right now we have the following ECF flags (among others):
>
> const = no read, no write, no side-effects, cseable
> pure = no write, no side-effects, cseable
> novops = no read, no write

note that I consider novops a red-herring.  novops is basically
'const' volatile, but we don't have  a volatile flag on calls (we
have it, but that's for the store to the result).  I'd rather get rid
of novops ...

> looping_const_or_pure = wait, actually there are side-effects
>
> Seems like the difference between novops and looping_const_or_pure is
> whether calls are subject to CSE.  Is that right?

No, novops and looping_const_or_pure tell you whether they are subject
to DCE, all const/pure functions are subject to CSE, novops functions
are not (as said above, novops is really a volatile const).

> Rather than checking these flag bundles, it seems to me that we ought to
> break them up into
>
> no reads
> no writes
> no side-effects
> cseable side-effects

no side-effects and cseable side-effects doesn't sound like a good
distinction to me - side-effects are not 'cse'-able or they would not
be side-effects.  But I suppose it all boils down to my question above,
where the PRE case is really the most interesting (can the formerly
second call "detect" it used the initialized path or the non-initialized
path?)

> So const  would be noread|nowrite|noside
>pure   would be nowrite|noside
> const|looping would be noread|nowrite|cseside
> pure|looping  would be nowrite|cseside
>novops would be noread|nowrite
>
> and the behavior I want would be just cseside.
>
> Does this make sense?  Anyone more familiar with this area of the code want
> to implement it for me?  :}

I'd say it's already implemented, just poorly documented.  Use pure!

Richard.

> Jason


Re: Functions that are CSEable but not pure

2012-10-03 Thread Jason Merrill

On 10/03/2012 07:06 PM, Joseph S. Myers wrote:

On Wed, 3 Oct 2012, Jason Merrill wrote:


My implementation of dynamic initialization of TLS variables as mandated by
the C++11 and OpenMP standards uses this idiom implicitly, so I'd really like
to be able to optimize away multiple calls.


I'd thought that dynamic initialization of TLS variables was supposed to
need ELF gABI, C++ ABI, binutils and glibc support as per Jakub's comment
in bug 27557 - have you found a way to avoid needing those, or are you
actually discussing something else?


Handling it via init-on-first-use avoids the need for ELF and binutils 
support.  The interface needs to be specified in the C++ ABI, and I we 
want glibc support for strict correctness (as with __cxa_atexit), but 
the functionality can be implemented without it.


Jason



Re: Functions that are CSEable but not pure

2012-10-03 Thread Joseph S. Myers
On Wed, 3 Oct 2012, Jason Merrill wrote:

> My implementation of dynamic initialization of TLS variables as mandated by
> the C++11 and OpenMP standards uses this idiom implicitly, so I'd really like
> to be able to optimize away multiple calls.

I'd thought that dynamic initialization of TLS variables was supposed to 
need ELF gABI, C++ ABI, binutils and glibc support as per Jakub's comment 
in bug 27557 - have you found a way to avoid needing those, or are you 
actually discussing something else?

-- 
Joseph S. Myers
jos...@codesourcery.com