Hello Mathieu, While trying to find some unrelated by, something in sync_runqueues_membarrier_state() caught my eye:
static int sync_runqueues_membarrier_state(struct mm_struct *mm) { if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1) { this_cpu_write(runqueues.membarrier_state, membarrier_state); /* * For single mm user, we can simply issue a memory barrier * after setting MEMBARRIER_STATE_GLOBAL_EXPEDITED in the * mm and in the current runqueue to guarantee that no memory * access following registration is reordered before * registration. */ smp_mb(); return 0; } [ snip ] smp_call_function_many(tmpmask, ipi_sync_rq_state, mm, 1); And ipi_sync_rq_state() does: this_cpu_write(runqueues.membarrier_state, atomic_read(&mm->membarrier_state)); So my question: are you aware smp_call_function_many() would not run ipi_sync_rq_state() on the local CPU? Is that the intention of the code? Thanks, Nadav