On Fri, Jun 17, 2016 at 3:26 AM, Ingo Molnar <mi...@kernel.org> wrote: > > * Kees Cook <keesc...@chromium.org> wrote: > >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -1993,6 +1993,23 @@ config PHYSICAL_ALIGN >> >> Don't change this unless you know what you are doing. >> >> +config RANDOMIZE_MEMORY >> + bool "Randomize the kernel memory sections" >> + depends on X86_64 >> + depends on RANDOMIZE_BASE >> + default RANDOMIZE_BASE >> + ---help--- >> + Randomizes the base virtual address of kernel memory sections >> + (physical memory mapping, vmalloc & vmemmap). This security feature >> + makes exploits relying on predictable memory locations less >> reliable. >> + >> + The order of allocations remains unchanged. Entropy is generated in >> + the same way as RANDOMIZE_BASE. Current implementation in the >> optimal >> + configuration have in average 30,000 different possible virtual >> + addresses for each memory section. >> + >> + If unsure, say N. > > So this is really poor naming! > > The user, in first approximation, will see something like this: > > Randomize the kernel memory sections (RANDOMIZE_MEMORY) [Y/n/?] (NEW) > > ... and could naively conclude that this feature will randomize memory > contents. > > Furthermore, there's no apparent connection to another, related kernel > feature: > > Randomize the address of the kernel image (KASLR) (RANDOMIZE_BASE) [Y/n/?] > > A better naming would be something like: > > Randomize other static kernel memory addresses (RANDOMIZE_ALL) [Y/n/?] > (NEW) > > (assuming it's truly 'all') > > But ... I don't see it explained anywhere why a user, once he expressed > interest > in randomization would even want to _not_ randomize as much as possible. I.e. > why > does this new option exist at all, shouldn't we just gradually extend > randomization to more memory areas and control it via CONFIG_RANDOMIZE_BASE? > > Btw., CONFIG_RANDOMIZE_BASE is probably a misnomer, and after these patches it > will be more of a misnomer - so we should rename it to something better. For > example CONFIG_KASLR would have the advantage of being obviously named at a > glance, to anyone who knows what 'ASLR' means. To those who don't know the > short > description will tell it: > > Kernel address space layout randomization (KASLR) [Y/n/?] > > This would also fit the kernel internal naming of kaslr.c/etc. > > What do you think? > >> +++ b/arch/x86/mm/kaslr.c >> @@ -0,0 +1,167 @@ >> +/* >> + * This file implements KASLR memory randomization for x86_64. It randomizes >> + * the virtual address space of kernel memory sections (physical memory >> + * mapping, vmalloc & vmemmap) for x86_64. This security feature mitigates >> + * exploits relying on predictable kernel addresses. >> + * >> + * Entropy is generated using the KASLR early boot functions now shared in >> + * the lib directory (originally written by Kees Cook). Randomization is >> + * done on PGD & PUD page table levels to increase possible addresses. The >> + * physical memory mapping code was adapted to support PUD level virtual >> + * addresses. This implementation on the best configuration provides 30,000 >> + * possible virtual addresses in average for each memory section. An >> + * additional low memory page is used to ensure each CPU can start with a >> + * PGD aligned virtual address (for realmode). >> + * >> + * The order of each memory section is not changed. The feature looks at >> + * the available space for the sections based on different configuration >> + * options and randomizes the base and space between each. The size of the >> + * physical memory mapping is the available physical memory. >> + * >> + */ > > (Stray newline.) > >> +#include <linux/kernel.h> >> +#include <linux/errno.h> >> +#include <linux/types.h> >> +#include <linux/mm.h> >> +#include <linux/smp.h> >> +#include <linux/init.h> >> +#include <linux/memory.h> >> +#include <linux/random.h> >> + >> +#include <asm/processor.h> >> +#include <asm/pgtable.h> >> +#include <asm/pgalloc.h> >> +#include <asm/e820.h> >> +#include <asm/init.h> >> +#include <asm/setup.h> >> +#include <asm/kaslr.h> >> +#include <asm/kasan.h> >> + >> +#include "mm_internal.h" > > How many of those include lines are truly unnecessary, or did this just got > copied > from another file? >
No that's correct. I will clean that up. >> +/* >> + * Memory base and end randomization is based on different configurations. >> + * We want as much space as possible to increase entropy available. >> + */ >> +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE; >> + >> +#if defined(CONFIG_KASAN) >> +static const unsigned long memory_rand_end = KASAN_SHADOW_START; >> +#elif defined(CONFIG_X86_ESPFIX64) >> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR; >> +#elif defined(CONFIG_EFI) >> +static const unsigned long memory_rand_end = EFI_VA_START; >> +#else >> +static const unsigned long memory_rand_end = __START_KERNEL_map; >> +#endif > > It's unclear to me from the naming and the description whether these are > virtual > or physical memory ranges. Nor is it explained why this laundry list of kernel > options influences the randomization range - and it's not explained how new > kernel > features modifying the virtual memory layout should plug into this mechanism. > > Could we please get a bit more verbose introduction into why all of this is > done? > The comment was describing that we want as much as possible for entropy. I will make that clearer and I will add information on how to change changing these variables. >> +/* Default values */ >> +unsigned long page_offset_base = __PAGE_OFFSET_BASE; >> +EXPORT_SYMBOL(page_offset_base); >> +unsigned long vmalloc_base = __VMALLOC_BASE; >> +EXPORT_SYMBOL(vmalloc_base); >> +unsigned long vmemmap_base = __VMEMMAP_BASE; >> +EXPORT_SYMBOL(vmemmap_base); >> + >> +/* Describe each randomized memory sections in sequential order */ >> +static __initdata struct kaslr_memory_region { >> + unsigned long *base; >> + unsigned short size_tb; >> +} kaslr_regions[] = { >> + { &page_offset_base, 64/* Maximum */ }, >> + { &vmalloc_base, VMALLOC_SIZE_TB }, >> + { &vmemmap_base, 1 }, >> +}; > > So this comment is totally misleading. This area does not 'describe' each > memory > section, it actually puts live pointers to virtual memory address offset > control > variables used by other subsystems into this array - allowing us to randomize > them > programmatically. > > This patch should be split up into 4 components: > > - core patch adding all the infrastructure but not actually randomizing > anything, > the kaslr_regions[] array is empty. > > - a patch adding page_offset_base randomization. > > - a patch adding vmalloc_base randomization. > > - a patch adding vmemmap_base randomization. > > This way if any breakage is introduced by this randomization it can be > bisected to > in a finegrained fashion. It would also have saved 5 minutes from my life > trying > to interpret the code here :-/ > > In fact I'd suggest a series of 7 patches, with each randomization patch split > into two further patches: > > - a patch making address offset variables dynamic. > > - a patch actually enabling the randomization of that virtual address range. > > This structure allows us to carefully evaluate the code size and runtime > impact of > randomizing a given range. > Make sense. I will build a 4 component patch a described. >> + >> +/* Size in Terabytes + 1 hole */ >> +static __init unsigned long get_padding(struct kaslr_memory_region *region) >> +{ >> + return ((unsigned long)region->size_tb + 1) << TB_SHIFT; >> +} > > So why is size_tb a short, forcing ugly type casts like this? Also, since this > appears to be a trivial shift, why isn't this an inline? > Changed. >> +/* Initialize base and padding for each memory section randomized with >> KASLR */ >> +void __init kernel_randomize_memory(void) >> +{ >> + size_t i; >> + unsigned long addr = memory_rand_start; >> + unsigned long padding, rand, mem_tb; >> + struct rnd_state rnd_st; >> + unsigned long remain_padding = memory_rand_end - memory_rand_start; > > Yeah, so these variable names are awful :-/ > > Look at how postfix and prefix naming is mixed in the same function: > 'memory_rand_start', but 'remain_padding'. Please use postfixes for all of > them. > > Also why is one variable named 'rand', another 'rnd'? Did we run out of 'a' > characters? > > Similarly, we have 'memory' and 'mem'. Pick one variant and harmonize! > > This patch is an object lesson in how fragile, insecure code is written. > Will change. >> + >> + /* >> + * All these BUILD_BUG_ON checks ensures the memory layout is >> + * consistent with the current KASLR design. >> + */ >> + BUILD_BUG_ON(memory_rand_start >= memory_rand_end); >> + BUILD_BUG_ON(config_enabled(CONFIG_KASAN) && >> + memory_rand_end >= ESPFIX_BASE_ADDR); >> + BUILD_BUG_ON((config_enabled(CONFIG_KASAN) || >> + config_enabled(CONFIG_X86_ESPFIX64)) && >> + memory_rand_end >= EFI_VA_START); >> + BUILD_BUG_ON((config_enabled(CONFIG_KASAN) || >> + config_enabled(CONFIG_X86_ESPFIX64) || >> + config_enabled(CONFIG_EFI)) && >> + memory_rand_end >= __START_KERNEL_map); >> + BUILD_BUG_ON(memory_rand_end > __START_KERNEL_map); >> + >> + if (!kaslr_enabled()) >> + return; >> + >> + BUG_ON(kaslr_regions[0].base != &page_offset_base); >> + mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT); > > Why is the granularity of the randomization 1TB? I don't see this explained > anywhere. Some of the randomization may have PUD-granularity limitation - but > PUD > entries have a size of 512 GB. > > I think we should extend the kaslr_regions[] table with a minimum required > alignment field instead of random granularity limitations. > I will change it so that the next region can start on the next PUD entry with minimum padding. >> +/* >> + * Create PGD aligned trampoline table to allow real mode initialization >> + * of additional CPUs. Consume only 1 low memory page. >> + */ >> +void __meminit init_trampoline(void) > > This too should probably be a separate patch, for bisectability and > reviewability > reasons. > Will do. >> + unsigned long addr, next; >> + pgd_t *pgd; >> + pud_t *pud_page, *tr_pud_page; >> + int i; > > I had to look twice to understand that 'tr' stands for 'trampoline'. Please > rename > to 'pud_page_tramp' or so. > Will do >> + if (!kaslr_enabled()) >> + return; >> + >> + tr_pud_page = alloc_low_page(); >> + set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page))); > > Shouldn't we set it _after_ we've initialized the tr_pud_page page table? > > Also, why _PAGE_TABLE? AFAICS using _PAGE_TABLE adds _PAGE_USER. > It is used by pgd_populate in kernel_physical_mapping_init. I will update to _KERNPG_TABLE. >> + >> + addr = 0; >> + pgd = pgd_offset_k((unsigned long)__va(addr)); > > Why the type cast? > > Also, similar complaint to the first patch: why is a physical address named > in an > ambiguous fashion, as 'addr'? > Will adapt as the other patch. >> + pud_page = (pud_t *) pgd_page_vaddr(*pgd); > > Side note: it appears all users of pgd_page_vaddr() are casting it > immediately to > a pointer type. Could we change the return type of this function to 'void *' > or > 'pud_t *'? AFAICS it's mostly used in platform code. > I agree but I think that should be a separate change. >> + >> + for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) { >> + pud_t *pud, *tr_pud; >> + >> + tr_pud = tr_pud_page + pud_index(addr); >> + pud = pud_page + pud_index((unsigned long)__va(addr)); > > Why is this type cast needed? > Because __va return a pointer and pud_index expect an unsigned long. I will edit in a similar way as the other patch. >> + next = (addr & PUD_MASK) + PUD_SIZE; >> + >> + /* Needed to copy pte or pud alike */ >> + BUILD_BUG_ON(sizeof(pud_t) != sizeof(pte_t)); >> + *tr_pud = *pud; > > So I don't understand what brought the size of 'pte' into this code. We are > copying pud entries around AFAICS, why should the size of the pte entry > matter? > To support PSE. Depending of the page_size_mask this entry can be a pud entry or a pte. I will remove the BUILD_BUG_ON, it is confusing. > Thanks, > > Ingo>