Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-20 Thread Thomas Gleixner
On Wed, 20 Dec 2017, Juergen Gross wrote:
> On 20/12/17 01:22, Thomas Gleixner wrote:
> > On Tue, 19 Dec 2017, Thomas Gleixner wrote:
> >> On Tue, 19 Dec 2017, Ingo Molnar wrote:
> >> We don't run out of space, but the 0-day robot triggered a nasty issue.
> >>
> >> The fixmap bottom address, which contains the early_ioremap fixmap area, 
> >> is:
> >>
> >> vaddr_bt = FIXADDR_TOP - FIX_BTMAP_BEGIN * PAGE_SIZE
> >>
> >> If that address is lower than:
> >>
> >> vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
> >>
> >> then cleanup_highmap() will happily 0 out the PMD entry for the PTE page of
> >> FIX_BTMAP. That entry was set up earlier in early_ioremap_init().
> >>
> >> As a consequence the first call to __early_set_fixmap() which tries to
> >> install a PTE for early_ioremap() will crash and burn.
> >>
> >> Below is a nasty hack which fixes the problem. Ideally we get all of this
> >> cpu_entry_stuff out of the fixmap. I'll look into that later, but for now
> >> the patch 'fixes' the issue.
> > 
> > I had a stab on moving the cpu_entry_area to some other place.
> > 
> > The patch below works, but:
> > 
> >  - it breaks i386 build because I have not yet found a way to place the
> >CPU_ENTRY_AREA_BASE without creating include recursion hell
> > 
> >  - it probably does not work on XEN_PV, but I'm too tired now to figure
> >that out.
> 
> The attached patch lets the system come up as XEN_PV.

I folded it back to the proper place.

Thanks for looking!

   tglx


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-20 Thread Juergen Gross
On 20/12/17 01:22, Thomas Gleixner wrote:
> On Tue, 19 Dec 2017, Thomas Gleixner wrote:
>> On Tue, 19 Dec 2017, Ingo Molnar wrote:
>> We don't run out of space, but the 0-day robot triggered a nasty issue.
>>
>> The fixmap bottom address, which contains the early_ioremap fixmap area, is:
>>
>> vaddr_bt = FIXADDR_TOP - FIX_BTMAP_BEGIN * PAGE_SIZE
>>
>> If that address is lower than:
>>
>> vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
>>
>> then cleanup_highmap() will happily 0 out the PMD entry for the PTE page of
>> FIX_BTMAP. That entry was set up earlier in early_ioremap_init().
>>
>> As a consequence the first call to __early_set_fixmap() which tries to
>> install a PTE for early_ioremap() will crash and burn.
>>
>> Below is a nasty hack which fixes the problem. Ideally we get all of this
>> cpu_entry_stuff out of the fixmap. I'll look into that later, but for now
>> the patch 'fixes' the issue.
> 
> I had a stab on moving the cpu_entry_area to some other place.
> 
> The patch below works, but:
> 
>  - it breaks i386 build because I have not yet found a way to place the
>CPU_ENTRY_AREA_BASE without creating include recursion hell
> 
>  - it probably does not work on XEN_PV, but I'm too tired now to figure
>that out.

The attached patch lets the system come up as XEN_PV.


Juergen
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index e4a6fe8354f0..577fa8adb785 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include "vsyscall_trace.h"
@@ -351,15 +352,15 @@ void __init set_vsyscall_pgtable_user_bits(pgd_t *root)
 	pmd_t *pmd;
 
 	pgd = pgd_offset_pgd(root, VSYSCALL_ADDR);
-	pgd->pgd |= _PAGE_USER;
+	set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER));
 	p4d = p4d_offset(pgd, VSYSCALL_ADDR);
 #if CONFIG_PGTABLE_LEVELS >= 5
 	p4d->p4d |= _PAGE_USER;
 #endif
 	pud = pud_offset(p4d, VSYSCALL_ADDR);
-	pud->pud |= _PAGE_USER;
+	set_pud(pud, __pud(pud_val(*pud) | _PAGE_USER));
 	pmd = pmd_offset(pud, VSYSCALL_ADDR);
-	pmd->pmd |= _PAGE_USER;
+	set_pmd(pmd, __pmd(pmd_val(*pmd) | _PAGE_USER));
 }
 
 void __init map_vsyscall(void)


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-19 Thread Thomas Gleixner
On Tue, 19 Dec 2017, Thomas Gleixner wrote:
> On Tue, 19 Dec 2017, Ingo Molnar wrote:
> We don't run out of space, but the 0-day robot triggered a nasty issue.
> 
> The fixmap bottom address, which contains the early_ioremap fixmap area, is:
> 
> vaddr_bt = FIXADDR_TOP - FIX_BTMAP_BEGIN * PAGE_SIZE
> 
> If that address is lower than:
> 
> vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;
> 
> then cleanup_highmap() will happily 0 out the PMD entry for the PTE page of
> FIX_BTMAP. That entry was set up earlier in early_ioremap_init().
> 
> As a consequence the first call to __early_set_fixmap() which tries to
> install a PTE for early_ioremap() will crash and burn.
> 
> Below is a nasty hack which fixes the problem. Ideally we get all of this
> cpu_entry_stuff out of the fixmap. I'll look into that later, but for now
> the patch 'fixes' the issue.

I had a stab on moving the cpu_entry_area to some other place.

The patch below works, but:

 - it breaks i386 build because I have not yet found a way to place the
   CPU_ENTRY_AREA_BASE without creating include recursion hell

 - it probably does not work on XEN_PV, but I'm too tired now to figure
   that out.

Thanks,

tglx

8<---

 Documentation/x86/x86_64/mm.txt |4 
 arch/x86/events/intel/ds.c  |   53 ++--
 arch/x86/include/asm/desc.h |1 
 arch/x86/include/asm/fixmap.h   |   89 
 arch/x86/include/asm/pgtable_32_types.h |6 -
 arch/x86/include/asm/pgtable_64_types.h |   49 ++-
 arch/x86/kernel/cpu/common.c|  125 
 arch/x86/kernel/dumpstack.c |1 
 arch/x86/kernel/traps.c |5 -
 arch/x86/mm/Makefile|2 
 arch/x86/mm/dump_pagetables.c   |2 
 arch/x86/mm/kasan_init_64.c |6 -
 arch/x86/mm/pti.c   |   39 +++--
 arch/x86/xen/mmu_pv.c   |2 
 b/arch/x86/include/asm/cpu_entry_area.h |   79 ++
 b/arch/x86/mm/cpu_entry_area.c  |  138 
 16 files changed, 309 insertions(+), 292 deletions(-)

--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -12,7 +12,8 @@ ea00 - eaff (=40
 ... unused hole ...
 ec00 - fbff (=44 bits) kasan shadow memory (16TB)
 ... unused hole ...
-fe80 - feff (=39 bits) LDT remap for PTI
+fe00 - fe7f (=39 bits) LDT remap for PTI
+fe80 - feff (=39 bits) cpu_entry_area mapping
 ff00 - ff7f (=39 bits) %esp fixup stacks
 ... unused hole ...
 ffef - fffe (=64 GB) EFI region mapping space
@@ -36,6 +37,7 @@ ffd4 - ffd5 (=49
 ... unused hole ...
 ffdf - fc00 (=53 bits) kasan shadow memory (8PB)
 ... unused hole ...
+fe80 - feff (=39 bits) cpu_entry_area mapping
 ff00 - ff7f (=39 bits) %esp fixup stacks
 ... unused hole ...
 ffef - fffe (=64 GB) EFI region mapping space
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -76,36 +76,39 @@ typedef struct { pteval_t pte; } pte_t;
 #define PGDIR_MASK (~(PGDIR_SIZE - 1))
 
 /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
-#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
+#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
 #ifdef CONFIG_X86_5LEVEL
-#define VMALLOC_SIZE_TB _AC(12800, UL)
-#define __VMALLOC_BASE _AC(0xffa0, UL)
-#define __VMEMMAP_BASE _AC(0xffd4, UL)
-#define LDT_PGD_ENTRY _AC(-112, UL)
-#define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define VMALLOC_SIZE_TB_AC(12800, UL)
+#define __VMALLOC_BASE _AC(0xffa0, UL)
+#define __VMEMMAP_BASE _AC(0xffd4, UL)
+#define LDT_PGD_ENTRY  _AC(-112, UL)
+#define LDT_BASE_ADDR  (LDT_PGD_ENTRY << PGDIR_SHIFT)
 #else
-#define VMALLOC_SIZE_TB_AC(32, UL)
-#define __VMALLOC_BASE _AC(0xc900, UL)
-#define __VMEMMAP_BASE _AC(0xea00, UL)
-#define LDT_PGD_ENTRY _AC(-3, UL)
-#define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define VMALLOC_SIZE_TB_AC(32, UL)
+#define __VMALLOC_BASE _AC(0xc900, UL)
+#define __VMEMMAP_BASE _AC(0xea00, UL)
+#define LDT_PGD_ENTRY  _AC(-4, UL)
+#define LDT_BASE_ADDR  (LDT_PGD_ENTRY << PGDIR_SHIFT)
 #endif
 #ifdef CONFIG_RANDOMIZE_MEMORY
-#define VMALLOC_START  vmalloc_base
-#define VMEMMAP_START  vmemmap_base
+#define VMALLOC_START  vmalloc_base
+#define VMEMMAP_START  vmemmap_base
 #else
-#define VMALLOC_START  __VMALLOC_BASE
-#define VMEMMAP_START  __VMEMMAP_BASE
+#def

Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-19 Thread Thomas Gleixner
On Tue, 19 Dec 2017, Ingo Molnar wrote:

> * Peter Zijlstra  wrote:
> 
> > On Mon, Dec 18, 2017 at 12:45:13PM -0800, Dave Hansen wrote:
> > > On 12/18/2017 12:41 PM, Peter Zijlstra wrote:
> > > >> I also don't think the user_shared area of the fixmap can get *that*
> > > >> big.  Does anybody know offhand what the theoretical limits are there?
> > > > Problem there is the nr_cpus term I think, we currently have up to 8k
> > > > CPUs, but I can see that getting bigger in the future.
> > > 
> > > It only matters if we go over 512GB, though.  Is the per-cpu part of the
> > > fixmap ever more than 512GB/8k=64MB?
> > 
> > Unlikely, I think the LDT (@ 32 pages / 128K) and the DS (@ 2*4 pages /
> > 32K) are the largest entries in there.
> 
> Note that with the latest state of things the LDT is not in the fixmap 
> anymore, 
> it's mapped separately, via Andy's following patch:
> 
>   e86aaee3f2d9: ("x86/pti: Put the LDT in its own PGD if PTI is on")
> 
> We have the IDT, the per-CPU entry area and the Debug Store (on Intel CPUs) 
> mapped 
> in the fixmap area, in addition to the usual fixmap entries that are a 
> handful of 
> pages. (That's on 64-bit - on 32-bit we have a pretty large kmap area.)
> 
> The biggest contribution to the size of the fixmap area is struct 
> cpu_entry_area 
> (FIX_CPU_ENTRY_AREA_BOTTOM..FIX_CPU_ENTRY_AREA_TOP), which is ~180k, i.e. 44 
> pages.
> 
> Our current NR_CPUS limit is 8,192 CPUs, but even with 65,536 CPUs the fixmap 
> area 
> would still only be ~12 GB total - so we are far from running out of space.

We don't run out of space, but the 0-day robot triggered a nasty issue.

The fixmap bottom address, which contains the early_ioremap fixmap area, is:

vaddr_bt = FIXADDR_TOP - FIX_BTMAP_BEGIN * PAGE_SIZE

If that address is lower than:

vaddr_end = __START_KERNEL_map + KERNEL_IMAGE_SIZE;

then cleanup_highmap() will happily 0 out the PMD entry for the PTE page of
FIX_BTMAP. That entry was set up earlier in early_ioremap_init().

As a consequence the first call to __early_set_fixmap() which tries to
install a PTE for early_ioremap() will crash and burn.

Below is a nasty hack which fixes the problem. Ideally we get all of this
cpu_entry_stuff out of the fixmap. I'll look into that later, but for now
the patch 'fixes' the issue.

Thanks,

tglx
8<-
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -209,6 +209,8 @@ extern pte_t *kmap_pte;
 #define kmap_prot PAGE_KERNEL
 extern pte_t *pkmap_page_table;
 
+extern pmd_t *early_ioremap_page_table;
+
 void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
 void native_set_fixmap(enum fixed_addresses idx,
   phys_addr_t phys, pgprot_t flags);
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -393,6 +393,15 @@ void __init cleanup_highmap(void)
for (; vaddr + PMD_SIZE - 1 < vaddr_end; pmd++, vaddr += PMD_SIZE) {
if (pmd_none(*pmd))
continue;
+   /*
+* Careful here. vaddr_end might be past the pmd which is
+* used by the early ioremap stuff. Don't clean that out as
+* it's already set up.
+*/
+   if (__phys_addr_nodebug((unsigned long) pmd) ==
+   __phys_addr_nodebug((unsigned long) 
early_ioremap_page_table))
+   continue;
+
if (vaddr < (unsigned long) _text || vaddr > end)
set_pmd(pmd, __pmd(0));
}
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -27,6 +27,8 @@
 
 #include "physaddr.h"
 
+pmd_t __initdata *early_ioremap_page_table;
+
 /*
  * Fix up the linear direct mapping of the kernel to avoid cache attribute
  * conflicts.
@@ -709,7 +711,7 @@ void __init early_ioremap_init(void)
pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN));
memset(bm_pte, 0, sizeof(bm_pte));
pmd_populate_kernel(&init_mm, pmd, bm_pte);
-
+   early_ioremap_page_table = pmd;
/*
 * The boot-ioremap range spans multiple pmds, for which
 * we are not prepared:




Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Mon, Dec 18, 2017 at 12:45:13PM -0800, Dave Hansen wrote:
> > On 12/18/2017 12:41 PM, Peter Zijlstra wrote:
> > >> I also don't think the user_shared area of the fixmap can get *that*
> > >> big.  Does anybody know offhand what the theoretical limits are there?
> > > Problem there is the nr_cpus term I think, we currently have up to 8k
> > > CPUs, but I can see that getting bigger in the future.
> > 
> > It only matters if we go over 512GB, though.  Is the per-cpu part of the
> > fixmap ever more than 512GB/8k=64MB?
> 
> Unlikely, I think the LDT (@ 32 pages / 128K) and the DS (@ 2*4 pages /
> 32K) are the largest entries in there.

Note that with the latest state of things the LDT is not in the fixmap anymore, 
it's mapped separately, via Andy's following patch:

  e86aaee3f2d9: ("x86/pti: Put the LDT in its own PGD if PTI is on")

We have the IDT, the per-CPU entry area and the Debug Store (on Intel CPUs) 
mapped 
in the fixmap area, in addition to the usual fixmap entries that are a handful 
of 
pages. (That's on 64-bit - on 32-bit we have a pretty large kmap area.)

The biggest contribution to the size of the fixmap area is struct 
cpu_entry_area 
(FIX_CPU_ENTRY_AREA_BOTTOM..FIX_CPU_ENTRY_AREA_TOP), which is ~180k, i.e. 44 
pages.

Our current NR_CPUS limit is 8,192 CPUs, but even with 65,536 CPUs the fixmap 
area 
would still only be ~12 GB total - so we are far from running out of space.

Btw., the DS portion of the fixmap is not 64K but 128K. Here's the current size 
distribution of struct cpu_entry_area:

  ::gdt   -4K
  ::tss   -   12K
  ::entry_trampoline  -4K
#if 64-bit
  ::exception_stacks  -   20K
#endif
#if Intel
  ::cpu_debug_store   -4K
  ::cpu_debug_buffers -  128K
#endif

So ::cpu_debug_buffers (struct debug_store_buffers) is the biggest fixmap 
chunk, 
by far, distributed the following way:

  ::bts_buffer[]  -   64K
  ::pebs_buffer[] -   64K

Thanks,

Ingo


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Andy Lutomirski
On Mon, Dec 18, 2017 at 12:34 PM, Dave Hansen  wrote:
> On 12/18/2017 03:42 AM, Thomas Gleixner wrote:
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -1120,6 +1120,11 @@ static inline void pmdp_set_wrprotect(st
>>  static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
>>  {
>> memcpy(dst, src, count * sizeof(pgd_t));
>> +#ifdef CONFIG_PAGE_TABLE_ISOLATION
>> + /* Clone the user space pgd as well */
>> + memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
>> +count * sizeof(pgd_t));
>> +#endif
>>  }
>
> I was just thinking about this as I re-write the documentation about
> where the overhead of pti comes from.
>
> This obviously *works* for now.  But, we certainly have the pti-mapped
> stuff spread much less through the address space than when this was
> thrown in here.  It *seems* like we could probably do this with just 4 PGDs:
>
>> pti_clone_user_shared();
>> pti_clone_entry_text();
>> pti_setup_espfix64();
>> pti_setup_vsyscall();
>
> The vsyscall is just one page and the espfix is *sized* to be one PGD,
> so we know each of those only takes one entry.
>
> We surely don't have 512GB of entry_text, and I don't think KASLR can
> ever cause it to span two PGD entries.

This would definitely work and, long-term, I think we should get rid
of the entry text mapping entirely.  The tricky bit is that we need to
rearrange the whole memory map fairly radically for this.

We could make it more compact, too: the vsyscall page and the
cpu_entry_area stuff can share a PGD.  The LDT could go in there, too.
The only requirement the LDT PGD has is that all of the next-level
entries that will ever be allocated are allocated at boot time.


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Peter Zijlstra
On Mon, Dec 18, 2017 at 12:45:13PM -0800, Dave Hansen wrote:
> On 12/18/2017 12:41 PM, Peter Zijlstra wrote:
> >> I also don't think the user_shared area of the fixmap can get *that*
> >> big.  Does anybody know offhand what the theoretical limits are there?
> > Problem there is the nr_cpus term I think, we currently have up to 8k
> > CPUs, but I can see that getting bigger in the future.
> 
> It only matters if we go over 512GB, though.  Is the per-cpu part of the
> fixmap ever more than 512GB/8k=64MB?

Unlikely, I think the LDT (@ 32 pages / 128K) and the DS (@ 2*4 pages /
32K) are the largest entries in there.


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Dave Hansen
On 12/18/2017 12:41 PM, Peter Zijlstra wrote:
>> I also don't think the user_shared area of the fixmap can get *that*
>> big.  Does anybody know offhand what the theoretical limits are there?
> Problem there is the nr_cpus term I think, we currently have up to 8k
> CPUs, but I can see that getting bigger in the future.

It only matters if we go over 512GB, though.  Is the per-cpu part of the
fixmap ever more than 512GB/8k=64MB?

In any case, it's not hard to make a little loop.  I think the common
case, at least is that we only do 4 PGDs.


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Peter Zijlstra
On Mon, Dec 18, 2017 at 12:34:22PM -0800, Dave Hansen wrote:
> On 12/18/2017 03:42 AM, Thomas Gleixner wrote:
> > --- a/arch/x86/include/asm/pgtable.h
> > +++ b/arch/x86/include/asm/pgtable.h
> > @@ -1120,6 +1120,11 @@ static inline void pmdp_set_wrprotect(st
> >  static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
> >  {
> > memcpy(dst, src, count * sizeof(pgd_t));
> > +#ifdef CONFIG_PAGE_TABLE_ISOLATION
> > +   /* Clone the user space pgd as well */
> > +   memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
> > +  count * sizeof(pgd_t));
> > +#endif
> >  }
> 
> I was just thinking about this as I re-write the documentation about
> where the overhead of pti comes from.

The think I thought of when I saw this earlier today was that this could
trivially be wrapped in a static_cpu_has(X86_FEATURE_PTI).

> This obviously *works* for now.  But, we certainly have the pti-mapped
> stuff spread much less through the address space than when this was
> thrown in here.  It *seems* like we could probably do this with just 4 PGDs:
> 
> > pti_clone_user_shared();
> > pti_clone_entry_text();
> > pti_setup_espfix64();
> > pti_setup_vsyscall();
> 
> The vsyscall is just one page and the espfix is *sized* to be one PGD,
> so we know each of those only takes one entry.
> 
> We surely don't have 512GB of entry_text, and I don't think KASLR can
> ever cause it to span two PGD entries.
> 
> I also don't think the user_shared area of the fixmap can get *that*
> big.  Does anybody know offhand what the theoretical limits are there?

Problem there is the nr_cpus term I think, we currently have up to 8k
CPUs, but I can see that getting bigger in the future.


Re: [patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Dave Hansen
On 12/18/2017 03:42 AM, Thomas Gleixner wrote:
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1120,6 +1120,11 @@ static inline void pmdp_set_wrprotect(st
>  static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
>  {
> memcpy(dst, src, count * sizeof(pgd_t));
> +#ifdef CONFIG_PAGE_TABLE_ISOLATION
> + /* Clone the user space pgd as well */
> + memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
> +count * sizeof(pgd_t));
> +#endif
>  }

I was just thinking about this as I re-write the documentation about
where the overhead of pti comes from.

This obviously *works* for now.  But, we certainly have the pti-mapped
stuff spread much less through the address space than when this was
thrown in here.  It *seems* like we could probably do this with just 4 PGDs:

> pti_clone_user_shared();
> pti_clone_entry_text();
> pti_setup_espfix64();
> pti_setup_vsyscall();

The vsyscall is just one page and the espfix is *sized* to be one PGD,
so we know each of those only takes one entry.

We surely don't have 512GB of entry_text, and I don't think KASLR can
ever cause it to span two PGD entries.

I also don't think the user_shared area of the fixmap can get *that*
big.  Does anybody know offhand what the theoretical limits are there?


[patch V163 27/51] x86/mm/pti: Populate user PGD

2017-12-18 Thread Thomas Gleixner
From: Dave Hansen 

In clone_pgd_range() copy the init user PGDs which cover the kernel half of
the address space, so a process has all the required kernel mappings
visible.

[ tglx: Split out from the big kaiser dump and folded Andys simplification ]

Signed-off-by: Dave Hansen 
Signed-off-by: Ingo Molnar 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Boris Ostrovsky 
Cc: Brian Gerst 
Cc: David Laight 
Cc: Denys Vlasenko 
Cc: Eduardo Valentin 
Cc: Greg KH 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Will Deacon 
Cc: aligu...@amazon.com
Cc: daniel.gr...@iaik.tugraz.at
Cc: hu...@google.com
Cc: keesc...@google.com
---
 arch/x86/include/asm/pgtable.h |5 +
 1 file changed, 5 insertions(+)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1120,6 +1120,11 @@ static inline void pmdp_set_wrprotect(st
 static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
 {
memcpy(dst, src, count * sizeof(pgd_t));
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+   /* Clone the user space pgd as well */
+   memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
+  count * sizeof(pgd_t));
+#endif
 }
 
 #define PTE_SHIFT ilog2(PTRS_PER_PTE)