Re: More atomic functions please

2011-05-13 Thread Jakub Jelinek
On Fri, May 13, 2011 at 07:55:44AM +0200, Piotr Wyderski wrote:
 Jakub Jelinek wrote:
 
   /* X86_TUNE_USE_INCDEC */
   ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),
 
  So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
  if addl is better (e.g. on Atom incl is very bad compared to addl $1),
  it will generate it.
 
 Why is lock inc/dec worse than add/sub on Core2I7?
 The only difference I know of is the way the carry flag
 is handled.

Yeah, and that is exactly the problem, instructions that only sets a subset
of the flags is problematic.
See e.g. Intel's 248966.pdf, 3.5.1.1 Use of the INC and DEC Instructions.

Jakub


Re: More atomic functions please

2011-05-13 Thread Piotr Wyderski
Jakub Jelinek wrote:

 And that's the right thing to do.

I concur. But the exchange case remains open.

 Please file an enhancement request in gcc bugzilla.

Done, 48986. I have also noticed several other missing
optimizations in this area, so I'm about to report them too.

Best regards,
Piotr Wyderski


Re: More atomic functions please

2011-05-12 Thread Joseph S. Myers
On Thu, 12 May 2011, Piotr Wyderski wrote:

 Hello,
 
 I'm not sure if it should be better handled as missed optimization,
 but there is a certain lack of functionality in the GCC's __sync_*
 function family.

I don't think we should add new functions to that family; instead the aim 
should be to implement the functionality (built-in functions etc.) 
required for a good implementation of the C1x and C++0x atomics support, 
and recommend users to use those in future.

http://gcc.gnu.org/wiki/Atomic

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: More atomic functions please

2011-05-12 Thread Jakub Jelinek
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote:
 Hello,
 
 I'm not sure if it should be better handled as missed optimization,
 but there is a certain lack of functionality in the GCC's __sync_*
 function family.
 
 When implementing a reference counting smart pointer, two operations
 are of crucial importance:
 
 void __sync_increment(T* p);
 bool __sync_decrement_iszero(T* p);
 
 The former only increments the location pointed to by p, the latter decrements
 it and returns true if and only if the result was zero.
 
 Both can be implemented in terms of existing __sync functions (and what
 can't? -- since there is __sync_bool_compare_and_swap()), e.g.:
 
void __sync_increment(T* p) {
 
   __sync_fetch_and_add(p, 1);
}
 
   bool __sync_decrement(T* p) {
 
  return __sync_fetch_and_add(p, -1) == 1;
   }

And that's the right thing to do.  If the generated code is not optimal, we 
should
just improve __sync_fetch_and_add code generation.
It isn't hard to special case addition of 1 or -1, and we already care whether
the result of the builtin is used or ignored.  For == 1 we could add some 
pattern that
combiner would merge.

Please file an enhancement request in gcc bugzilla.

Jakub


Re: More atomic functions please

2011-05-12 Thread Jakub Jelinek
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote:
 Unfortunately, onx86/x64 both are compiled in a rather poor way:
 
 __sync_increment:
 
 lock addl $x01,(ptr)
 
 which is longer than:
 
lock incl (ptr)

GCC actually generates lock incl (ptr) already now, it just depends
on which CPU you optimize for.
  /* X86_TUNE_USE_INCDEC */
  ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),

So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
if addl is better (e.g. on Atom incl is very bad compared to addl $1),
it will generate it.

Jakub


Re: More atomic functions please

2011-05-12 Thread Piotr Wyderski
Jakub Jelinek wrote:

  /* X86_TUNE_USE_INCDEC */
  ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),

 So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
 if addl is better (e.g. on Atom incl is very bad compared to addl $1),
 it will generate it.

Why is lock inc/dec worse than add/sub on Core2I7?
The only difference I know of is the way the carry flag
is handled.

  Best regards,
  Piotr Wyderski