Re: More atomic functions please
On Fri, May 13, 2011 at 07:55:44AM +0200, Piotr Wyderski wrote: Jakub Jelinek wrote: /* X86_TUNE_USE_INCDEC */ ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM), So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl, if addl is better (e.g. on Atom incl is very bad compared to addl $1), it will generate it. Why is lock inc/dec worse than add/sub on Core2I7? The only difference I know of is the way the carry flag is handled. Yeah, and that is exactly the problem, instructions that only sets a subset of the flags is problematic. See e.g. Intel's 248966.pdf, 3.5.1.1 Use of the INC and DEC Instructions. Jakub
Re: More atomic functions please
Jakub Jelinek wrote: And that's the right thing to do. I concur. But the exchange case remains open. Please file an enhancement request in gcc bugzilla. Done, 48986. I have also noticed several other missing optimizations in this area, so I'm about to report them too. Best regards, Piotr Wyderski
Re: More atomic functions please
On Thu, 12 May 2011, Piotr Wyderski wrote: Hello, I'm not sure if it should be better handled as missed optimization, but there is a certain lack of functionality in the GCC's __sync_* function family. I don't think we should add new functions to that family; instead the aim should be to implement the functionality (built-in functions etc.) required for a good implementation of the C1x and C++0x atomics support, and recommend users to use those in future. http://gcc.gnu.org/wiki/Atomic -- Joseph S. Myers jos...@codesourcery.com
Re: More atomic functions please
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote: Hello, I'm not sure if it should be better handled as missed optimization, but there is a certain lack of functionality in the GCC's __sync_* function family. When implementing a reference counting smart pointer, two operations are of crucial importance: void __sync_increment(T* p); bool __sync_decrement_iszero(T* p); The former only increments the location pointed to by p, the latter decrements it and returns true if and only if the result was zero. Both can be implemented in terms of existing __sync functions (and what can't? -- since there is __sync_bool_compare_and_swap()), e.g.: void __sync_increment(T* p) { __sync_fetch_and_add(p, 1); } bool __sync_decrement(T* p) { return __sync_fetch_and_add(p, -1) == 1; } And that's the right thing to do. If the generated code is not optimal, we should just improve __sync_fetch_and_add code generation. It isn't hard to special case addition of 1 or -1, and we already care whether the result of the builtin is used or ignored. For == 1 we could add some pattern that combiner would merge. Please file an enhancement request in gcc bugzilla. Jakub
Re: More atomic functions please
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote: Unfortunately, onx86/x64 both are compiled in a rather poor way: __sync_increment: lock addl $x01,(ptr) which is longer than: lock incl (ptr) GCC actually generates lock incl (ptr) already now, it just depends on which CPU you optimize for. /* X86_TUNE_USE_INCDEC */ ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM), So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl, if addl is better (e.g. on Atom incl is very bad compared to addl $1), it will generate it. Jakub
Re: More atomic functions please
Jakub Jelinek wrote: /* X86_TUNE_USE_INCDEC */ ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM), So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl, if addl is better (e.g. on Atom incl is very bad compared to addl $1), it will generate it. Why is lock inc/dec worse than add/sub on Core2I7? The only difference I know of is the way the carry flag is handled. Best regards, Piotr Wyderski