The Itanium code for GOMP_start_critical starts 0x2000000000334900 <GOMP_critical_start>: [MMI] alloc r16=ar.pfs,1,1,0 0x2000000000334901 <GOMP_critical_start+1>: addl r32=840,r1 0x2000000000334902 <GOMP_critical_start+2>: nop.i 0x0 0x2000000000334910 <GOMP_critical_start+16>: [MMI] mf;; 0x2000000000334911 <GOMP_critical_start+17>: mov.m ar.ccv=0 0x2000000000334912 <GOMP_critical_start+18>: mov r14=1;; 0x2000000000334920 <GOMP_critical_start+32>: [MMI] nop.m 0x0 0x2000000000334921 <GOMP_critical_start+33>: cmpxchg4.rel r14=[r32],r14,ar.ccv 0x2000000000334922 <GOMP_critical_start+34>: nop.i 0x0;; 0x2000000000334930 <GOMP_critical_start+48>: [MIB] nop.m 0x0 0x2000000000334931 <GOMP_critical_start+49>: cmp.eq p6,p7=0,r14 0x2000000000334932 <GOMP_critical_start+50>: (p06) br.ret.dptk.many b0;;
Note the mf followed by a cmxchg4.rel. I don't believe this enforces sufficient memory ordering constraints. A subsequent store from inside the critical section may become visible to other threads before the cmpxchg4.rel, which is only intended to prevent reordering in the OTHER direction. Thus a store inside the critical section can become visible before the lock is really acquired, which is, at least theoretically, very bad. I do not know whether current hardware may actually execute these out of order. I observed this while trying to understand the GNU OpenMP support. I also don't know whether this problem is limited to Itanium. I expect it doesn't exist on X86. It may exist onother weakly-ordered architectures. I believe that this is due to incorrect code generated for the __sync_bool_compare_and_swap in gomp_mutex_lock(). -- Summary: GOMP_critical_start wrong on Itanium due to __sync miscompilation Product: gcc Version: 4.4.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: Hans dot Boehm at hp dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42869