http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46447
--- Comment #3 from Benjamin Kosnik <bkoz at gcc dot gnu.org> 2012-02-03 21:23:44 UTC --- Just an update on this... with gcc-4.7 2012-02-03 -g -O2 %./a-02.out spin_mutex_tt took 84.0168 cycles (averaged over 10000 trials) af_set_tt took 36.7017 cycles (averaged over 10000 trials) af_clear_tt took 60.3904 cycles (averaged over 10000 trials) aint_bit_set_tt took 52.4867 cycles (averaged over 10000 trials) aint_clear_tt took 60.3904 cycles (averaged over 10000 trials) auchar_bit_set_tt took 52.4867 cycles (averaged over 10000 trials) auchar_clear_tt took 60.3911 cycles (averaged over 10000 trials) sizeof(aint) = 4 sizeof(auchar) = 1 sizeof(std::atomic_flag) = 1 -g -O3 %./a-03.out spin_mutex_tt took 84.0168 cycles (averaged over 10000 trials) af_set_tt took 36.7017 cycles (averaged over 10000 trials) af_clear_tt took 60.3904 cycles (averaged over 10000 trials) aint_bit_set_tt took 52.486 cycles (averaged over 10000 trials) aint_clear_tt took 60.3911 cycles (averaged over 10000 trials) auchar_bit_set_tt took 52.4867 cycles (averaged over 10000 trials) auchar_clear_tt took 60.3911 cycles (averaged over 10000 trials) sizeof(aint) = 4 sizeof(auchar) = 1 sizeof(std::atomic_flag) = 1 So, it looks like atomic_flag set is now faster (!), but that the clears are comparable. I'd like to generalize this test case for the performance testsuite.