On Tue, Jun 18, 2024 at 9:40 PM Alexandre Ghiti <alexgh...@rivosinc.com> wrote: > > Hi Andy, > > On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu <andy.c...@sifive.com> wrote: > > > > On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <a...@ghiti.fr> wrote: > > > > > > Hi Conor, > > > > > > On 17/06/2024 15:23, Alexandre Ghiti wrote: > > > > Hi Conor, > > > > > > > > Sorry for the delay here. > > > > > > > > On 13/06/2024 09:48, Conor Dooley wrote: > > > >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote: > > > >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used") > > > >>> converted ftrace_make_nop() to use patch_insn_write() which does not > > > >>> emit any icache flush relying entirely on __ftrace_modify_code() to do > > > >>> that. > > > >>> > > > >>> But we missed that ftrace_make_nop() was called very early directly > > > >>> when > > > >>> converting mcount calls into nops (actually on riscv it converts 2B > > > >>> nops > > > >>> emitted by the compiler into 4B nops). > > > >>> > > > >>> This caused crashes on multiple HW as reported by Conor and Björn > > > >>> since > > > >>> the booting core could have half-patched instructions in its icache > > > >>> which would trigger an illegal instruction trap: fix this by emitting > > > >>> a > > > >>> local flush icache when early patching nops. > > > >>> > > > >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used") > > > >>> Signed-off-by: Alexandre Ghiti <alexgh...@rivosinc.com> > > > >>> --- > > > >>> arch/riscv/include/asm/cacheflush.h | 6 ++++++ > > > >>> arch/riscv/kernel/ftrace.c | 3 +++ > > > >>> 2 files changed, 9 insertions(+) > > > >>> > > > >>> diff --git a/arch/riscv/include/asm/cacheflush.h > > > >>> b/arch/riscv/include/asm/cacheflush.h > > > >>> index dd8d07146116..ce79c558a4c8 100644 > > > >>> --- a/arch/riscv/include/asm/cacheflush.h > > > >>> +++ b/arch/riscv/include/asm/cacheflush.h > > > >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void) > > > >>> asm volatile ("fence.i" ::: "memory"); > > > >>> } > > > >>> +static inline void local_flush_icache_range(unsigned long start, > > > >>> + unsigned long end) > > > >>> +{ > > > >>> + local_flush_icache_all(); > > > >>> +} > > > >>> + > > > >>> #define PG_dcache_clean PG_arch_1 > > > >>> static inline void flush_dcache_folio(struct folio *folio) > > > >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c > > > >>> index 4f4987a6d83d..32e7c401dfb4 100644 > > > >>> --- a/arch/riscv/kernel/ftrace.c > > > >>> +++ b/arch/riscv/kernel/ftrace.c > > > >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct > > > >>> dyn_ftrace *rec) > > > >>> out = ftrace_make_nop(mod, rec, MCOUNT_ADDR); > > > >>> mutex_unlock(&text_mutex); > > > >> So, turns out that this patch is not sufficient. I've seen some more > > > >> crashes, seemingly due to initialising nops on this mutex_unlock(). > > > >> Palmer suggested moving the if (!mod) ... into the lock, which fixed > > > >> the problem for me. > > > > > > > > > > > > Ok, it makes sense, I completely missed that. I'll send a fix for that > > > > shortly so that it can be merged in rc5. > > > > > > > > Thanks, > > > > > > > > Alex > > > > > > > > > So I digged a bit more and I'm afraid that the only way to make sure > > > this issue does not happen elsewhere is to flush the icache right after > > > the patching. We actually can't wait to batch the icache flush since > > > along the way, we may call a function that has just been patched (the > > > issue that you encountered here). > > > > Trying to provide my thoughts, please let me know if I missed > > anything. I think what Conor suggested is safe for init_nop, as it > > would be called only when there is only one core (booting) and at the > > loading stage of kernel modules. In the first case we just have to > > make sure there is no patchable entry before the core executes > > fence.i. The second case is unconditionally safe because there is no > > read-side of the race. > > > > It is a bit tricky for the generic (runtime) case of ftrace code > > patching, but that is not because of the batch fence.i maintenance. As > > long as there exists a patchable entry for the stopping thread, it is > > possible for them to execute on a partially patched instruction. A > > solution for this is again to prevent any patchable entry in the > > stop_machine's stopping thread. Another solution is to apply the > > atomic ftrace patching series which aims to get rid of the race. > > Yeah but my worry is that we can't make sure that functions called > between the patching and the fence.i are not the ones that just get > patched. At least today, patch_insn_write() etc should be marked as
IIUC, the compiler does not generate a patchable entry for patch_insn_write, and so do all functions in patch.c. The entire file is not compiled with patchable entry because the flag is removed in riscv's Makefile. Please let me know if I misunderstand it. > noinstr. But even then, we call core functions that could be patchable > (I went through all and it *seems* we are safe *for now*, but this is > very fragile). Yes, functions in the "else" clause, atomic_read() etc, are safe for now. However, it is impossible to fix as long as there exists a patchable entry along the way. This is the problem that we cannot patch 2 instructions with a concurrent read-side. On the other hand, the path of ftrace_modify_all_code may experience the batch fence.i maintenance issue, and can be fixed via a local fence.i. This is because the path is executed by only one core. There does not exist a concurrent write-side. As a result, we just need to make sure it leaves each patch-site with a local fence.i to make sure code is up-to-date. > > Then what do you think we should do to fix Conor's issue right now: > should we mark the riscv specific functions as noinstr, cross fingers > that the core functions are not (and don't become) patchable and wait > for your atomic patching? Or should we flush as soon as possible as I > proposed above (which to me fixes the issue but hurts performance)? Either way works for me. IMHO, the fix should include: 1. move fence.i before mutex_unlock in init_nop 2. do local fence.i in __ftrace_modify_call 3. make sure the else clause does not happen to have a patchable entry Thanks, Andy > > Thanks, > > Alex > > > > > > > > > I don't know how much it will impact the performance but I guess it will. > > > > > > Unless someone has a better idea (I added Andy and Puranjay in cc), here > > > is the patch that implements this, can you give it a try? Thanks > > > > > > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c > > > index 87cbd86576b2..4b95c574fd04 100644 > > > --- a/arch/riscv/kernel/ftrace.c > > > +++ b/arch/riscv/kernel/ftrace.c > > > @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct > > > dyn_ftrace *rec) > > > out = ftrace_make_nop(mod, rec, MCOUNT_ADDR); > > > mutex_unlock(&text_mutex); > > > > > > - if (!mod) > > > - local_flush_icache_range(rec->ip, rec->ip + > > > MCOUNT_INSN_SIZE); > > > - > > > return out; > > > } > > > > > > @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data) > > > } else { > > > while (atomic_read(¶m->cpu_count) <= > > > num_online_cpus()) > > > cpu_relax(); > > > - } > > > > > > - local_flush_icache_all(); > > > + local_flush_icache_all(); > > > + } > > > > > > return 0; > > > } > > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c > > > index 4007563fb607..ab03732d06c4 100644 > > > --- a/arch/riscv/kernel/patch.c > > > +++ b/arch/riscv/kernel/patch.c > > > @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t > > > len) > > > > > > memset(waddr, c, len); > > > > > > + /* > > > + * We could have just patched a function that is about to be > > > + * called so make sure we don't execute partially patched > > > + * instructions by flushing the icache as soon as possible. > > > + */ > > > + local_flush_icache_range((unsigned long)waddr, > > > + (unsigned long)waddr + len); > > > + > > > patch_unmap(FIX_TEXT_POKE0); > > > > > > if (across_pages) > > > @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const > > > void *insn, size_t len) > > > > > > ret = copy_to_kernel_nofault(waddr, insn, len); > > > > > > + /* > > > + * We could have just patched a function that is about to be > > > + * called so make sure we don't execute partially patched > > > + * instructions by flushing the icache as soon as possible. > > > + */ > > > + local_flush_icache_range((unsigned long)waddr, > > > + (unsigned long)waddr + len); > > > + > > > patch_unmap(FIX_TEXT_POKE0); > > > > > > if (across_pages) > > > @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t > > > len) > > > > > > ret = patch_insn_set(tp, c, len); > > > > > > - if (!ret) > > > - flush_icache_range((uintptr_t)tp, (uintptr_t)tp + len); > > > - > > > return ret; > > > } > > > NOKPROBE_SYMBOL(patch_text_set_nosync); > > > @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void *insns, > > > size_t len) > > > > > > ret = patch_insn_write(tp, insns, len); > > > > > > - if (!ret) > > > - flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len); > > > - > > > return ret; > > > } > > > NOKPROBE_SYMBOL(patch_text_nosync); > > > @@ -253,9 +263,9 @@ static int patch_text_cb(void *data) > > > } else { > > > while (atomic_read(&patch->cpu_count) <= > > > num_online_cpus()) > > > cpu_relax(); > > > - } > > > > > > - local_flush_icache_all(); > > > + local_flush_icache_all(); > > > + } > > > > > > return ret; > > > } > > > > > > > > > > > > > > > > > >> > > > >>> + if (!mod) > > > >>> + local_flush_icache_range(rec->ip, rec->ip + > > > >>> MCOUNT_INSN_SIZE); > > > >>> + > > > >>> return out; > > > >>> } > > > >>> -- > > > >>> 2.39.2 > > > >>> > > > >>> > > > >>> _______________________________________________ > > > >>> linux-riscv mailing list > > > >>> linux-ri...@lists.infradead.org > > > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > > > > > _______________________________________________ > > > > linux-riscv mailing list > > > > linux-ri...@lists.infradead.org > > > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > > > Cheers, > > Andy