Vaibhav Jain <vaib...@linux.vnet.ibm.com> writes: > Hi Mpe, > > Thanks for reviewing the patch > > Michael Ellerman <m...@ellerman.id.au> writes: > >>> + ctx->elem->software_state = cpu_to_be32(CXL_PE_SOFTWARE_STATE_V); >>> + /* Make sure the changes to the PE are visible to the card */ >> >> A barrier orders something vs something else. So what's the something >> else in this case? Is it the afu_reset() below, what does that actually do? >> > > The issue is with call to afu_enable() after the call to afu_reset that > would start the AFU. If this load gets reordered and PSL doesnt see the > valid bit set for this structure then it will result in PSL entering a > freeze-state.
OK, so it's ordering the store above to ctx->elem->software_state vs the store to the AFU in afu_enable(). > Though on second thoughts afu_enable() is grabbing a spin-lock before > doing an mmio to start the AFU that would be forcing a barrier > anyways. But since that spans the function boundary hence to be safe > have added a write barrier after populating the process element. The spin lock doesn't help you, stores are allowed to leak into the locked region. But the MMIO is preceeded by a sync. > Lastly function is not performance critical as it will be usually called > in the life time of a process only once. So the impact smp_wmb() is > having would be minimal. Sure. Performance is not the issue, barriers are subtle so it's important that they're well documented. So I don't think you need the barrier, because the out_be64() will do it for you. But if you really want to add one, I don't mind. But, you should use wmb(), not smp_wmb(), because the ordering is still required on non-SMP systems. And please update the comment to capture all of the above discussion. cheers