> I did not even think of that, and it actually makes sense so I'll go > with what you propose: I'll replace atomic_inc() with > atomic_inc_return_release(). And I'll add the following comment if > that's ok with you: > > "Make sure the patching store is effective *before* we increment the > counter which releases all waiting cpus"
Yes, this sounds good to me. > Honestly, I looked at it one minute, did not understand its purpose > and said to myself "ok that can't hurt anyway, I may be missing > something". > > FWIW, I see that arm64 uses isb() here. If you don't see its purpose, > I'll remove it (here and where I copied it). Removing the smp_mb() (and keeping the local_flush_icache_all()) seems fine to me; thanks for the confirmation. > > On a last topic, although somehow orthogonal to the scope of this patch, > > I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is > > correct: I can see why we may want (need to do) the local TLB flush be- > > fore returning from patch_{map,unmap}(), but does a local flush suffice? > > For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb) > > sequence in their unmapping stage (and apparently relying on "no caching > > of invalid ptes" in their mapping stage). Of course, "broadcasting" our > > (riscv's) TLB invalidations will necessary introduce some complexity... > > > > Thoughts? > > To avoid remote TLBI, could we simply disable the preemption before > the first patch_map()? arm64 disables the irqs, but that seems > overkill to me, but maybe I'm missing something again? Mmh, I'm afraid this will require more thinking/probing on my end (not really "the expert" of the codebase at stake...). Maybe the ftrace reviewers will provide further ideas/suggestions for us to brainstorm. Andrea