On Thu, Jun 26, 2025 at 5:07 PM Mathieu Desnoyers <mathieu.desnoy...@efficios.com> wrote: > > On 2025-06-26 11:52, Dylan Yudaken wrote: > > No reason to not allow MEMBARRIER_CMD_FLAG_CPU on > > MEMBARRIER_CMD_PRIVATE_EXPEDITED or > > MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE. > > > > If it is known specifically what cpu you want to interrupt then there > > is a decent efficiency saving in not interrupting all the other ones. > > > > Also - the code already works as is for them. > > Can you elaborate on a concrete use-case justifying adding this ? > > Thanks, > > Mathieu >
So my use case is for core-local data such as performance counters. I have a library that allows a fast thread to "lock" a core -> do some work (probably incrementing some performance counters) -> unlock. The "lock" uses restartable sequences (ie no serializing instructions), and the unlock just writes a 0 to memory (again, no serializing instructions). A slow thread will occasionally (say every few minutes) try and read data computed in the work section. It does this by disabling locking and firing off a membarrier(RSEQ) on that core to be sure that the core is either "locked" or "unlocked". It then spins waiting for it to be unlocked. At this point my understanding is a bit fuzzy - but I believe you need that core to have a memory barrier since there is no serializing instruction and the processor would happily reorder some "work" after the "unlock" instruction. That serializing instruction is what I want from this. But since I know the cpu_id that I am working with I don't need to do a barrier on _all_ the cores. To be clear: (1) I don't have a current real world use case, and (2) my library/design/understanding might be buggy. (3) I don't have a use case for the SYNC_CORE part, but again it seemed easy enough to add and I presume others might have a use case.