On 18/08/2021 14:05, Michael Ellerman wrote: > Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes > on one of his systems: > > kernel tried to execute exec-protected page (c008000004073278) - exploit > attempt? (uid: 0) > BUG: Unable to handle kernel instruction fetch > Faulting instruction address: 0xc008000004073278 > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ... > CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12 > Workqueue: events control_work_handler [virtio_console] > NIP: c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0 > REGS: c00000002e4ef7e0 TRAP: 0400 Not tainted (5.14.0-rc4+) > MSR: 800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: > 200400cf > ... > NIP fill_queue+0xf0/0x210 [virtio_console] > LR fill_queue+0xf0/0x210 [virtio_console] > Call Trace: > fill_queue+0xb4/0x210 [virtio_console] (unreliable) > add_port+0x1a8/0x470 [virtio_console] > control_work_handler+0xbc/0x1e8 [virtio_console] > process_one_work+0x290/0x590 > worker_thread+0x88/0x620 > kthread+0x194/0x1a0 > ret_from_kernel_thread+0x5c/0x64 > > Jordan, Fabiano & Murilo were able to reproduce and identify that the > problem is caused by the call to module_enable_ro() in do_init_module(), > which happens after the module's init function has already been called. > > Our current implementation of change_page_attr() is not safe against > concurrent accesses, because it invalidates the PTE before flushing the > TLB and then installing the new PTE. That leaves a window in time where > there is no valid PTE for the page, if another CPU tries to access the > page at that time we see something like the fault above. > > We can't simply switch to set_pte_at()/flush TLB, because our hash MMU > code doesn't handle a set_pte_at() of a valid PTE. See [1]. > > But we do have pte_update(), which replaces the old PTE with the new, > meaning there's no window where the PTE is invalid. And the hash MMU > version hash__pte_update() deals with synchronising the hash page table > correctly. > > [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r....@linux.ibm.com/ > > Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines") > Reported-by: Laurent Vivier <lviv...@redhat.com> > Signed-off-by: Fabiano Rosas <faro...@linux.ibm.com> > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > --- > arch/powerpc/mm/pageattr.c | 23 ++++++++++------------- > 1 file changed, 10 insertions(+), 13 deletions(-) > > v2: Use pte_update(..., ~0, pte_val(pte), ...) as suggested by Fabiano, > and ptep_get() as suggested by Christophe. > > diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c > index 0876216ceee6..edea388e9d3f 100644 > --- a/arch/powerpc/mm/pageattr.c > +++ b/arch/powerpc/mm/pageattr.c > @@ -18,16 +18,12 @@ > /* > * Updates the attributes of a page in three steps: > * > - * 1. invalidate the page table entry > - * 2. flush the TLB > - * 3. install the new entry with the updated attributes > - * > - * Invalidating the pte means there are situations where this will not work > - * when in theory it should. > - * For example: > - * - removing write from page whilst it is being executed > - * - setting a page read-only whilst it is being read by another CPU > + * 1. take the page_table_lock > + * 2. install the new entry with the updated attributes > + * 3. flush the TLB > * > + * This sequence is safe against concurrent updates, and also allows > updating the > + * attributes of a page currently being executed or accessed. > */ > static int change_page_attr(pte_t *ptep, unsigned long addr, void *data) > { > @@ -36,9 +32,7 @@ static int change_page_attr(pte_t *ptep, unsigned long > addr, void *data) > > spin_lock(&init_mm.page_table_lock); > > - /* invalidate the PTE so it's safe to modify */ > - pte = ptep_get_and_clear(&init_mm, addr, ptep); > - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > + pte = ptep_get(ptep); > > /* modify the PTE bits as desired, then apply */ > switch (action) { > @@ -59,11 +53,14 @@ static int change_page_attr(pte_t *ptep, unsigned long > addr, void *data) > break; > } > > - set_pte_at(&init_mm, addr, ptep, pte); > + pte_update(&init_mm, addr, ptep, ~0UL, pte_val(pte), 0); > > /* See ptesync comment in radix__set_pte_at() */ > if (radix_enabled()) > asm volatile("ptesync": : :"memory"); > + > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > + > spin_unlock(&init_mm.page_table_lock); > > return 0; > > base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30 >
Tested-by: Laurent Vivier <lviv...@redhat.com>