http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48986
Summary: Missed optimization in atomic decrement on x86/x64 Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: piotr.wyder...@gmail.com Many uses of __sync_fetch_and_add() boil down to decrement operation and checking if the result is zero in order to delete the pointee. The most natural way is to define it as: bool xxx_decrement(int* p) { return __sync_fetch_and_add(p, -1) == 1; } void yyy(int* p) { if (xxx_decrement(p)) { delete p; } } Unfortunately, GCC compiles it in a literal way: <__Z3yyyPi>: 40edd0: 83 ec 0c sub $0xc,%esp 40edd3: ba ff ff ff ff mov $0xffffffff,%edx 40edd8: 8b 44 24 10 mov 0x10(%esp),%eax 40eddc: f0 0f c1 10 lock xadd %edx,(%eax) 40ede0: 83 fa 01 cmp $0x1,%edx 40ede3: 74 0b je 40edf0 <__Z3yyyPi+0x20> 40ede5: 83 c4 0c add $0xc,%esp 40ede8: c3 ret 40ede9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi 40edf0: 89 44 24 10 mov %eax,0x10(%esp) 40edf4: 83 c4 0c add $0xc,%esp 40edf7: e9 24 03 00 00 jmp 40f120 <___wrap__ZdlPv> 40edfc: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi with the gist being: 40edd3: ba ff ff ff ff mov $0xffffffff,%edx 40eddc: f0 0f c1 10 lock xadd %edx,(%eax) 40ede0: 83 fa 01 cmp $0x1,%edx 40ede3: 74 0b je 40edf0 <__Z3yyyPi+0x20> This special case should be handled by the optimizer and produce: lock sub $0x01,(%eax) je ... or: lock dec (%eax) je ... on platforms which do not suffer carry chain dependency penalties, e.g. some AMD's chips. Please note that this generalizes for any N: return __sync_fetch_and_add(p, -N) == N; with a remark that for N != 1 the dec replacement can't be used.