Re: [PATCH v5 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.

2018-10-01 Thread James Morse
Hi Mark,

On 01/10/18 11:41, James Morse wrote:
> On 24/09/18 17:36, Mark Rutland wrote:
>> On Mon, Sep 17, 2018 at 12:43:32PM +0800, Jun Yao wrote:
>>> Since we will move the swapper_pg_dir to rodata section, we need a
>>> way to update it. The fixmap can handle it. When the swapper_pg_dir
>>> needs to be updated, we map it dynamically. The map will be
>>> canceled after the update is complete. In this way, we can defend
>>> against KSMA(Kernel Space Mirror Attack).
> 
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 71532bcd76c1..a8a60927f716 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -67,6 +67,24 @@ static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
>>>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>>>  
>>> +static DEFINE_SPINLOCK(swapper_pgdir_lock);
>>> +
>>> +void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
>>> +{
>>> +   pgd_t *fixmap_pgdp;
>>> +
>>> +   spin_lock(&swapper_pgdir_lock);
>>> +   fixmap_pgdp = pgd_set_fixmap(__pa(pgdp));
>>> +   WRITE_ONCE(*fixmap_pgdp, pgd);
>>> +   /*
>>> +* We need dsb(ishst) here to ensure the page-table-walker sees
>>> +* our new entry before set_p?d() returns. The fixmap's
>>> +* flush_tlb_kernel_range() via clear_fixmap() does this for us.
>>> +*/
>>> +   pgd_clear_fixmap();
>>> +   spin_unlock(&swapper_pgdir_lock);
>>> +}

>> Are we certain we never poke the kernel page tables in IRQ context?
> 
> The RAS code was doing this, but was deemed unsafe, and changed to use the
> fixmap: https://lkml.org/lkml/2017/10/30/500
> The fixmap only ever touches the last level, so can't hit this.
> 
> x86 can't do its IPI tlb-maintenance from IRQ context, so anything trying to
> unmap from irq context is already broken: https://lkml.org/lkml/2018/9/6/324
> 
> vunmap()/vfree() is allowed from irq context, but it defers its work.
> 
> I can't find any way to pass GFP_ATOMIC into ioremap(),
> I didn't think vmalloc() could either, ...  but now I spot __vmalloc() does...
> 
> This __vmalloc() path is used by the percpu allocator, which starting from
> pcpu_alloc() can be passed something other than GFP_KERNEL, and uses
> spin_lock_irqsave(), so it is expecting to be called in irq context.
> 
> ... so yes it looks like this can happen.

But! These two things (irq-context and calls-__vmalloc()) can't happen at the
same time. If pcpu_alloc() is passed GFP_ATOMIC, and pcpu_alloc_area() fails,
(so a new chunk needs to be allocated), it will fail instead.

(This explains the scary looking "if (!in_atomic) mutex_lock()", in that code).


If you try it, you hit the "BUG_ON(in_interrupt())", in
__get_vm_area_node(). So even if you do pass GFP_ATOMIC in here, you can't call
it from interrupt context. (sanity prevails!)

I was wrong, it doesn't need fixing.


James


Re: [PATCH v5 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.

2018-10-01 Thread James Morse
Hi Mark,

On 24/09/18 17:36, Mark Rutland wrote:
> On Mon, Sep 17, 2018 at 12:43:32PM +0800, Jun Yao wrote:
>> Since we will move the swapper_pg_dir to rodata section, we need a
>> way to update it. The fixmap can handle it. When the swapper_pg_dir
>> needs to be updated, we map it dynamically. The map will be
>> canceled after the update is complete. In this way, we can defend
>> against KSMA(Kernel Space Mirror Attack).

>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 71532bcd76c1..a8a60927f716 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -67,6 +67,24 @@ static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
>>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>>  
>> +static DEFINE_SPINLOCK(swapper_pgdir_lock);
>> +
>> +void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
>> +{
>> +pgd_t *fixmap_pgdp;
>> +
>> +spin_lock(&swapper_pgdir_lock);
>> +fixmap_pgdp = pgd_set_fixmap(__pa(pgdp));
>> +WRITE_ONCE(*fixmap_pgdp, pgd);
>> +/*
>> + * We need dsb(ishst) here to ensure the page-table-walker sees
>> + * our new entry before set_p?d() returns. The fixmap's
>> + * flush_tlb_kernel_range() via clear_fixmap() does this for us.
>> + */
>> +pgd_clear_fixmap();
>> +spin_unlock(&swapper_pgdir_lock);
>> +}
> 
> I'm rather worried that we could deadlock here.

We can use the irqsave versions if you're worried, but I think any code doing
this is already broken.

(I'd like to eventually depend on the init_mm.page_table_lock for this, but it
isn't held when the vmemmap is being populated.)


> Are we certain we never poke the kernel page tables in IRQ context?

The RAS code was doing this, but was deemed unsafe, and changed to use the
fixmap: https://lkml.org/lkml/2017/10/30/500
The fixmap only ever touches the last level, so can't hit this.

x86 can't do its IPI tlb-maintenance from IRQ context, so anything trying to
unmap from irq context is already broken: https://lkml.org/lkml/2018/9/6/324

vunmap()/vfree() is allowed from irq context, but it defers its work.

I can't find any way to pass GFP_ATOMIC into ioremap(),
I didn't think vmalloc() could either, ...  but now I spot __vmalloc() does...

This __vmalloc() path is used by the percpu allocator, which starting from
pcpu_alloc() can be passed something other than GFP_KERNEL, and uses
spin_lock_irqsave(), so it is expecting to be called in irq context.

... so yes it looks like this can happen.

I'll post a fix


Thanks!

James


Re: [PATCH v5 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.

2018-09-24 Thread Mark Rutland
On Mon, Sep 17, 2018 at 12:43:32PM +0800, Jun Yao wrote:
> Since we will move the swapper_pg_dir to rodata section, we need a
> way to update it. The fixmap can handle it. When the swapper_pg_dir
> needs to be updated, we map it dynamically. The map will be
> canceled after the update is complete. In this way, we can defend
> against KSMA(Kernel Space Mirror Attack).
> 
> Signed-off-by: Jun Yao 
> ---
>  arch/arm64/include/asm/pgtable.h | 38 ++--
>  arch/arm64/mm/mmu.c  | 25 +++--
>  2 files changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index b11d6fc62a62..9e643fc2453d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -429,8 +429,29 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
> unsigned long pfn,
>PUD_TYPE_TABLE)
>  #endif
>  
> +extern pgd_t init_pg_dir[PTRS_PER_PGD];
> +extern pgd_t init_pg_end[];
> +extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> +extern pgd_t swapper_pg_end[];
> +extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
> +extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
> +
> +extern void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd);
> +
> +static inline bool in_swapper_pgdir(void *addr)
> +{
> + return ((unsigned long)addr & PAGE_MASK) ==
> + ((unsigned long)swapper_pg_dir & PAGE_MASK);
> +}
> +
>  static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
>  {
> +#ifdef __PAGETABLE_PMD_FOLDED
> + if (in_swapper_pgdir(pmdp)) {
> + set_swapper_pgd((pgd_t *)pmdp, __pgd(pmd_val(pmd)));
> + return;
> + }
> +#endif

So that we can get consistent build coverage, could we make this:

if (__is_defined(__PAGETABLE_PMD_FOLDED) && in_swapper_pgdir(pmdp)) {
set_swapper_pgd((pgd_t *)pmdp, __pgd(pmd_val(pmd)));
return;
}

>   WRITE_ONCE(*pmdp, pmd);
>  
>   if (pmd_valid(pmd))
> @@ -484,6 +505,12 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>  
>  static inline void set_pud(pud_t *pudp, pud_t pud)
>  {
> +#ifdef __PAGETABLE_PUD_FOLDED
> + if (in_swapper_pgdir(pudp)) {
> + set_swapper_pgd((pgd_t *)pudp, __pgd(pud_val(pud)));
> + return;
> + }
> +#endif

... and likewise:

if (__is_enabled(__PAGETABLE_PUD_FOLDED) && in_swapper_pgdir(pudp)) {
set_swapper_pgd((pgd_t *)pudp, __pgd(pud_val(pud)));
return;
}

Thanks,
Mark.


Re: [PATCH v5 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.

2018-09-24 Thread Mark Rutland
On Mon, Sep 17, 2018 at 12:43:32PM +0800, Jun Yao wrote:
> Since we will move the swapper_pg_dir to rodata section, we need a
> way to update it. The fixmap can handle it. When the swapper_pg_dir
> needs to be updated, we map it dynamically. The map will be
> canceled after the update is complete. In this way, we can defend
> against KSMA(Kernel Space Mirror Attack).
> 
> Signed-off-by: Jun Yao 
> ---
>  arch/arm64/include/asm/pgtable.h | 38 ++--
>  arch/arm64/mm/mmu.c  | 25 +++--
>  2 files changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index b11d6fc62a62..9e643fc2453d 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -429,8 +429,29 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
> unsigned long pfn,
>PUD_TYPE_TABLE)
>  #endif
>  
> +extern pgd_t init_pg_dir[PTRS_PER_PGD];
> +extern pgd_t init_pg_end[];
> +extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> +extern pgd_t swapper_pg_end[];
> +extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
> +extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
> +
> +extern void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd);
> +
> +static inline bool in_swapper_pgdir(void *addr)
> +{
> + return ((unsigned long)addr & PAGE_MASK) ==
> + ((unsigned long)swapper_pg_dir & PAGE_MASK);
> +}
> +
>  static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
>  {
> +#ifdef __PAGETABLE_PMD_FOLDED
> + if (in_swapper_pgdir(pmdp)) {
> + set_swapper_pgd((pgd_t *)pmdp, __pgd(pmd_val(pmd)));
> + return;
> + }
> +#endif
>   WRITE_ONCE(*pmdp, pmd);
>  
>   if (pmd_valid(pmd))
> @@ -484,6 +505,12 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>  
>  static inline void set_pud(pud_t *pudp, pud_t pud)
>  {
> +#ifdef __PAGETABLE_PUD_FOLDED
> + if (in_swapper_pgdir(pudp)) {
> + set_swapper_pgd((pgd_t *)pudp, __pgd(pud_val(pud)));
> + return;
> + }
> +#endif
>   WRITE_ONCE(*pudp, pud);
>  
>   if (pud_valid(pud))
> @@ -538,6 +565,10 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  
>  static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
>  {
> + if (in_swapper_pgdir(pgdp)) {
> + set_swapper_pgd(pgdp, pgd);
> + return;
> + }

It's somewhat frustrating that we have to duplicate this logic across
all of set_p{m,u,g}d(), rather than this living in set_pgd(), passing
the value up set_pmd() -> set_pud() -> set_pgd().

I see that the generic no-p{m,u}d headers force this structure, and I
haven't come up with anything better. :/

>   WRITE_ONCE(*pgdp, pgd);
>   dsb(ishst);
>  }
> @@ -718,13 +749,6 @@ static inline pmd_t pmdp_establish(struct vm_area_struct 
> *vma,
>  }
>  #endif
>  
> -extern pgd_t init_pg_dir[PTRS_PER_PGD];
> -extern pgd_t init_pg_end[];
> -extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> -extern pgd_t swapper_pg_end[];
> -extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
> -extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
> -
>  /*
>   * Encode and decode a swap entry:
>   *   bits 0-1:   present (must be zero)
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 71532bcd76c1..a8a60927f716 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -67,6 +67,24 @@ static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>  
> +static DEFINE_SPINLOCK(swapper_pgdir_lock);
> +
> +void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
> +{
> + pgd_t *fixmap_pgdp;
> +
> + spin_lock(&swapper_pgdir_lock);
> + fixmap_pgdp = pgd_set_fixmap(__pa(pgdp));
> + WRITE_ONCE(*fixmap_pgdp, pgd);
> + /*
> +  * We need dsb(ishst) here to ensure the page-table-walker sees
> +  * our new entry before set_p?d() returns. The fixmap's
> +  * flush_tlb_kernel_range() via clear_fixmap() does this for us.
> +  */
> + pgd_clear_fixmap();
> + spin_unlock(&swapper_pgdir_lock);
> +}

I'm rather worried that we could deadlock here.

Are we certain we never poke the kernel page tables in IRQ context?

Otherwise, this looks fine to me.

Thanks,
Mark.

> +
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> unsigned long size, pgprot_t vma_prot)
>  {
> @@ -629,8 +647,11 @@ static void __init map_kernel(pgd_t *pgdp)
>   */
>  void __init paging_init(void)
>  {
> - map_kernel(swapper_pg_dir);
> - map_mem(swapper_pg_dir);
> + pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
> +
> + map_kernel(pgdp);
> + map_mem(pgdp);
> + pgd_clear_fixmap();
>   cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
>   init_mm.pgd = swapper_pg_dir;
>  }
> -- 
> 2.17.1
> 


[PATCH v5 5/6] arm64/mm: Populate the swapper_pg_dir by fixmap.

2018-09-16 Thread Jun Yao
Since we will move the swapper_pg_dir to rodata section, we need a
way to update it. The fixmap can handle it. When the swapper_pg_dir
needs to be updated, we map it dynamically. The map will be
canceled after the update is complete. In this way, we can defend
against KSMA(Kernel Space Mirror Attack).

Signed-off-by: Jun Yao 
---
 arch/arm64/include/asm/pgtable.h | 38 ++--
 arch/arm64/mm/mmu.c  | 25 +++--
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b11d6fc62a62..9e643fc2453d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -429,8 +429,29 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
unsigned long pfn,
 PUD_TYPE_TABLE)
 #endif
 
+extern pgd_t init_pg_dir[PTRS_PER_PGD];
+extern pgd_t init_pg_end[];
+extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
+extern pgd_t swapper_pg_end[];
+extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
+extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
+
+extern void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd);
+
+static inline bool in_swapper_pgdir(void *addr)
+{
+   return ((unsigned long)addr & PAGE_MASK) ==
+   ((unsigned long)swapper_pg_dir & PAGE_MASK);
+}
+
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
+#ifdef __PAGETABLE_PMD_FOLDED
+   if (in_swapper_pgdir(pmdp)) {
+   set_swapper_pgd((pgd_t *)pmdp, __pgd(pmd_val(pmd)));
+   return;
+   }
+#endif
WRITE_ONCE(*pmdp, pmd);
 
if (pmd_valid(pmd))
@@ -484,6 +505,12 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
 {
+#ifdef __PAGETABLE_PUD_FOLDED
+   if (in_swapper_pgdir(pudp)) {
+   set_swapper_pgd((pgd_t *)pudp, __pgd(pud_val(pud)));
+   return;
+   }
+#endif
WRITE_ONCE(*pudp, pud);
 
if (pud_valid(pud))
@@ -538,6 +565,10 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 
 static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
+   if (in_swapper_pgdir(pgdp)) {
+   set_swapper_pgd(pgdp, pgd);
+   return;
+   }
WRITE_ONCE(*pgdp, pgd);
dsb(ishst);
 }
@@ -718,13 +749,6 @@ static inline pmd_t pmdp_establish(struct vm_area_struct 
*vma,
 }
 #endif
 
-extern pgd_t init_pg_dir[PTRS_PER_PGD];
-extern pgd_t init_pg_end[];
-extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
-extern pgd_t swapper_pg_end[];
-extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
-extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
-
 /*
  * Encode and decode a swap entry:
  * bits 0-1:   present (must be zero)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 71532bcd76c1..a8a60927f716 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -67,6 +67,24 @@ static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
 static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
 static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
 
+static DEFINE_SPINLOCK(swapper_pgdir_lock);
+
+void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+   pgd_t *fixmap_pgdp;
+
+   spin_lock(&swapper_pgdir_lock);
+   fixmap_pgdp = pgd_set_fixmap(__pa(pgdp));
+   WRITE_ONCE(*fixmap_pgdp, pgd);
+   /*
+* We need dsb(ishst) here to ensure the page-table-walker sees
+* our new entry before set_p?d() returns. The fixmap's
+* flush_tlb_kernel_range() via clear_fixmap() does this for us.
+*/
+   pgd_clear_fixmap();
+   spin_unlock(&swapper_pgdir_lock);
+}
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
  unsigned long size, pgprot_t vma_prot)
 {
@@ -629,8 +647,11 @@ static void __init map_kernel(pgd_t *pgdp)
  */
 void __init paging_init(void)
 {
-   map_kernel(swapper_pg_dir);
-   map_mem(swapper_pg_dir);
+   pgd_t *pgdp = pgd_set_fixmap(__pa_symbol(swapper_pg_dir));
+
+   map_kernel(pgdp);
+   map_mem(pgdp);
+   pgd_clear_fixmap();
cpu_replace_ttbr1(lm_alias(swapper_pg_dir));
init_mm.pgd = swapper_pg_dir;
 }
-- 
2.17.1