http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49028
--- Comment #1 from Piotr Wyderski <piotr.wyderski at gmail dot com> 2011-05-17 17:24:03 UTC --- If I change the function to: template <unsigned int N> void R<N>::xxx_release(void* p) { char* q = reinterpret_cast<char*>(m_Cursor); char* b = reinterpret_cast<char*>(m_Data); q = ((q + sizeof(void*)) - b) % (N * sizeof(void*)) + b; m_Cursor = reinterpret_cast<void**>(q); *m_Cursor = p; } Then the generated code is: 000000000041a910 <_ZN1RILj16EE11xxx_releaseEPv>: 41a910: 48 8b 87 80 00 00 00 mov 0x80(%rdi),%rax 41a917: 48 83 c0 08 add $0x8,%rax 41a91b: 48 29 f8 sub %rdi,%rax 41a91e: 83 e0 7f and $0x7f,%eax 41a921: 48 01 f8 add %rdi,%rax 41a924: 48 89 87 80 00 00 00 mov %rax,0x80(%rdi) 41a92b: 48 89 30 mov %rsi,(%rax) 41a92e: c3 retq which is astonishingly close to my hand-made assembly...