Hi Mark, On 01/10/18 11:41, James Morse wrote: > On 24/09/18 17:36, Mark Rutland wrote: >> On Mon, Sep 17, 2018 at 12:43:32PM +0800, Jun Yao wrote: >>> Since we will move the swapper_pg_dir to rodata section, we need a >>> way to update it. The fixmap can handle it. When the swapper_pg_dir >>> needs to be updated, we map it dynamically. The map will be >>> canceled after the update is complete. In this way, we can defend >>> against KSMA(Kernel Space Mirror Attack). > >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >>> index 71532bcd76c1..a8a60927f716 100644 >>> --- a/arch/arm64/mm/mmu.c >>> +++ b/arch/arm64/mm/mmu.c >>> @@ -67,6 +67,24 @@ static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss; >>> static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused; >>> static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused; >>> >>> +static DEFINE_SPINLOCK(swapper_pgdir_lock); >>> + >>> +void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd) >>> +{ >>> + pgd_t *fixmap_pgdp; >>> + >>> + spin_lock(&swapper_pgdir_lock); >>> + fixmap_pgdp = pgd_set_fixmap(__pa(pgdp)); >>> + WRITE_ONCE(*fixmap_pgdp, pgd); >>> + /* >>> + * We need dsb(ishst) here to ensure the page-table-walker sees >>> + * our new entry before set_p?d() returns. The fixmap's >>> + * flush_tlb_kernel_range() via clear_fixmap() does this for us. >>> + */ >>> + pgd_clear_fixmap(); >>> + spin_unlock(&swapper_pgdir_lock); >>> +}
>> Are we certain we never poke the kernel page tables in IRQ context? > > The RAS code was doing this, but was deemed unsafe, and changed to use the > fixmap: https://lkml.org/lkml/2017/10/30/500 > The fixmap only ever touches the last level, so can't hit this. > > x86 can't do its IPI tlb-maintenance from IRQ context, so anything trying to > unmap from irq context is already broken: https://lkml.org/lkml/2018/9/6/324 > > vunmap()/vfree() is allowed from irq context, but it defers its work. > > I can't find any way to pass GFP_ATOMIC into ioremap(), > I didn't think vmalloc() could either, ... but now I spot __vmalloc() does... > > This __vmalloc() path is used by the percpu allocator, which starting from > pcpu_alloc() can be passed something other than GFP_KERNEL, and uses > spin_lock_irqsave(), so it is expecting to be called in irq context. > > ... so yes it looks like this can happen. But! These two things (irq-context and calls-__vmalloc()) can't happen at the same time. If pcpu_alloc() is passed GFP_ATOMIC, and pcpu_alloc_area() fails, (so a new chunk needs to be allocated), it will fail instead. (This explains the scary looking "if (!in_atomic) mutex_lock()", in that code). If you try it, you hit the "BUG_ON(in_interrupt())", in __get_vm_area_node(). So even if you do pass GFP_ATOMIC in here, you can't call it from interrupt context. (sanity prevails!) I was wrong, it doesn't need fixing. James