Re: [PATCH V2 0/3] Fix EFI memory map leaks

2018-12-04 Thread Ard Biesheuvel
On Wed, 5 Dec 2018 at 08:41, Ingo Molnar  wrote:
>
>
> * Sai Praneeth Prakhya  wrote:
>
> > Presently, in EFI subsystem of kernel, every time kernel allocates memory 
> > for a
> > new EFI memory map, it forgets to free the memory occupied by old EFI 
> > memory map.
> > It does clear the mappings though (using efi_memmap_unmap()), but forgets to
> > free up the memory. Also, there is another minor issue, where in the newly
> > allocated memory isn't freed, should remap fail.
> >
> > The first issue is addressed by adding efi_memmap_free() to 
> > efi_memmap_install()
> > and the second issue is addressed by pushing the remap code into
> > efi_memmap_alloc() and there by handling the failure condition.
> >
> > Memory allocated to EFI memory map is leaked in below functions and hence 
> > they
> > are modified to fix the issue. Functions that modify EFI memmap are:
> > 1. efi_clean_memmap(),
> > 2. efi_fake_memmap(),
> > 3. efi_arch_mem_reserve(),
> > 4. efi_free_boot_services(),
> > 5. and __efi_enter_virtual_mode()
> >
> > More detailed explanation:
> > --
> > A typical boot flow on EFI supported x86_64 machines might look something 
> > like
> > below
> > 1. EFI memory map is passed by firmware to kernel.
> > 2. Kernel does a memblock_reserve() on this memory
> >(see efi_memblock_x86_reserve_range()).
> > 3. This memory map is checked for invalid entries in efi_clean_memmap(). If 
> > any
> >invalid entries are found, they are omitted from EFI memory map but the
> >memory occupied by these invalid EFI memory descriptors isn't freed.
> > 3. To further process this memory map (see efi_fake_memmap(), 
> > efi_bgrt_init()
> >and efi_esrt_init()), kernel allocates memory using efi_memmap_alloc() 
> > and
> >copies the processed memory map to newly allocated memory but it forgets 
> > to
> >free memory occupied by old EFI memory map.
> > 4. Further, in efi_map_regions() the EFI memory map is processed again to
> >include only EFI memory descriptors of type Runtime Code/Data and Boot
> >Code/Data. Again, memory is allocated for this new memory map through
> >realloc_pages() and the old EFI memory map is not freed.
> > 5. After SetVirtualAddressMap() is done, the EFI memory map is processed 
> > again
> >to have only EFI memory descriptors of type Runtime Code/Data. Again, 
> > memory
> >is allocated for this new memory map through efi_memmap_alloc() and the 
> > old
> >EFI memory map is not freed.
>
> So it was only halfway through the changelog that I realized that by
> 'free the memory occupied by old EFI memory map' you didn't mean the EFI
> memory map itself (the system RAM described), but the EFI memory map
> *table*, this relatively small array of descriptors that we allocate and
> reallocate every now and then according to the limitations of the
> (largely non-existent yet) early boot memory management system...
>
> Could we please use much more precise terminology when referring to these
> entities, in changelogs and code alike? I.e.:
>
>  - 'EFI memory map' is the whole map and the contents
>
>  - 'EFI memory map table' is the table structure itself, without regard
>of its contents
>
>  - 'EFI memory map table entry/descriptor' is an entry in the table.
>
> This kind of clarity is what we are using on the e820 memory map side as
> well: we have struct e820_table and struct e820_entry.
>
> Ard, would you object to standardizing the EFI code in a similar fashion
> as well: efi_mem_table and efi_mem_entry?
>

I share your concern, but 'EFI memory map' already has an established
meaning beyond Linux, as the set of memory map entries describing the
address space to the OS. (for instance, the EFI boot service to
retrieve it is called GetMemoryMap())

I am not sure what 'the whole EFI memory map' would mean'. It
describes all of DRAM (used, unused or reserved) and on top of that,
some peripheral mappings that need to be virtually remapped by the OS
so that the runtime services can use them (e.g., RTC or flash). So
from UEFI spec pov, your notion of 'EFI memory map' is just the entire
memory map.

So if we are going to rename these things wholesale (which is fine
with me), I'd prefer it if we can drop the 'map' entirely.

EFI memory table
EFI memory table entry
EFI memory table descriptor

For things described by the EFI memory table, we need to be more
specific anyway, since it typically describes everything, so

EFI boot services area
EFI runtime services area
EFI reserved area
EFI conventional memory area
EFI MMIO area

etc etc

Since we're being pedantic, it also makes sense to decide now whether
'area' refers to all [discontiguous] regions or just one of them. I'd
say the former, and use 'region' for the latter, i.e., an area may be
made up of several regions, but only one exists of each type.

> Parameters and local variables should be named unambiguously following
> these concepts as well. There should be no opaque 'new' 

Re: [PATCH V2 0/3] Fix EFI memory map leaks

2018-12-04 Thread Ingo Molnar


* Sai Praneeth Prakhya  wrote:

> Presently, in EFI subsystem of kernel, every time kernel allocates memory for 
> a
> new EFI memory map, it forgets to free the memory occupied by old EFI memory 
> map.
> It does clear the mappings though (using efi_memmap_unmap()), but forgets to
> free up the memory. Also, there is another minor issue, where in the newly
> allocated memory isn't freed, should remap fail.
> 
> The first issue is addressed by adding efi_memmap_free() to 
> efi_memmap_install()
> and the second issue is addressed by pushing the remap code into
> efi_memmap_alloc() and there by handling the failure condition.
> 
> Memory allocated to EFI memory map is leaked in below functions and hence they
> are modified to fix the issue. Functions that modify EFI memmap are:
> 1. efi_clean_memmap(),
> 2. efi_fake_memmap(),
> 3. efi_arch_mem_reserve(),
> 4. efi_free_boot_services(),
> 5. and __efi_enter_virtual_mode()
> 
> More detailed explanation:
> --
> A typical boot flow on EFI supported x86_64 machines might look something like
> below
> 1. EFI memory map is passed by firmware to kernel.
> 2. Kernel does a memblock_reserve() on this memory
>(see efi_memblock_x86_reserve_range()).
> 3. This memory map is checked for invalid entries in efi_clean_memmap(). If 
> any
>invalid entries are found, they are omitted from EFI memory map but the
>memory occupied by these invalid EFI memory descriptors isn't freed.
> 3. To further process this memory map (see efi_fake_memmap(), efi_bgrt_init()
>and efi_esrt_init()), kernel allocates memory using efi_memmap_alloc() and
>copies the processed memory map to newly allocated memory but it forgets to
>free memory occupied by old EFI memory map.
> 4. Further, in efi_map_regions() the EFI memory map is processed again to
>include only EFI memory descriptors of type Runtime Code/Data and Boot
>Code/Data. Again, memory is allocated for this new memory map through
>realloc_pages() and the old EFI memory map is not freed.
> 5. After SetVirtualAddressMap() is done, the EFI memory map is processed again
>to have only EFI memory descriptors of type Runtime Code/Data. Again, 
> memory
>is allocated for this new memory map through efi_memmap_alloc() and the old
>EFI memory map is not freed.

So it was only halfway through the changelog that I realized that by 
'free the memory occupied by old EFI memory map' you didn't mean the EFI 
memory map itself (the system RAM described), but the EFI memory map 
*table*, this relatively small array of descriptors that we allocate and 
reallocate every now and then according to the limitations of the 
(largely non-existent yet) early boot memory management system...

Could we please use much more precise terminology when referring to these 
entities, in changelogs and code alike? I.e.:

 - 'EFI memory map' is the whole map and the contents

 - 'EFI memory map table' is the table structure itself, without regard 
   of its contents

 - 'EFI memory map table entry/descriptor' is an entry in the table.

This kind of clarity is what we are using on the e820 memory map side as 
well: we have struct e820_table and struct e820_entry.

Ard, would you object to standardizing the EFI code in a similar fashion 
as well: efi_mem_table and efi_mem_entry?

Parameters and local variables should be named unambiguously following 
these concepts as well. There should be no opaque 'new' variables 
*especially* where the primary role of the efi_memmap_insert() API call 
is to add a new *entry* - it should be 'new_efi_map_table' (or 
'new_table' where it's unambiguous) instead, and new entries added should 
be 'new_entry' - etc.

Once this is reorganized and clarified it should be a lot more easy to 
review 'table freeing' patches. I can help out with patches if there's no 
conceptual objections.

Thanks,

Ingo


Re: [PATCH V2 2/3] x86/efi: Fix EFI memory map leaks

2018-12-04 Thread Ingo Molnar


* Sai Praneeth Prakhya  wrote:

>   if (efi_enabled(EFI_MEMMAP)) {
> + /*
> +  * efi_clean_memmap() uses memblock_phys_alloc() to allocate
> +  * memory for new EFI memmap and hence will work only after
> +  * e820__memblock_setup()
> +  */
> + efi_clean_memmap();
>   efi_fake_memmap();
>   efi_find_mirror();
>   efi_esrt_init();

I'd also suggest a namespace cleanup before we do any material 
modifications:

'efi_esrt_init()' is the proper pattern to follow, it's prefixed by efi_, 
then followed by the more generic subsystem (_esrt) and then by the 
functionality (_init). This is a good, hierarchical, top-down 
nomenclature that makes it easy to grep for ESRT functionality by typing 
'git grep esrt'.

The same is not true of the memmap functionality: 'git grep efi_memmap_' 
doesn't do the right thing.

So I think this should be renamed to:

efi_memmap_clean()
efi_memmap_insert()
efi_memmap_free()
efi_memmap_print()
efi_memmap_fake()
etc.

While at it, there's another area that could be improved:

 - efi_memmap_fake() is a bit weird naming: it's not really 'fake', 
   presumably the specified table is still very much real, otherwise it 
   won't result in a working kernel.

   What that functionality *really* is about is user-defined entries. Why 
   not name it in such a fashion? efi_memmap_init_user_defined() or so?

Thanks,

Ingo


Re: [PATCH V2 1/3] efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free()

2018-12-04 Thread Ingo Molnar


* Sai Praneeth Prakhya  wrote:

> +/**
> + * efi_memmap_free - Free memory pointed by new_memmap.map
> + * @new_memmap: Structure that describes EFI memory map.
> + *
> + * Memory is freed depending on the type of allocation performed.
> + */
> +static void __init efi_memmap_free(struct efi_memory_map new_memmap)
> +{
> + phys_addr_t start, end;
> + unsigned long size = new_memmap.nr_map * new_memmap.desc_size;
> + unsigned int order = get_order(size);
> +
> + start = new_memmap.phys_map;
> + end = start + size;
> + if (new_memmap.late) {
> + __free_pages(pfn_to_page(PHYS_PFN(start)), order);
> + return;
> + }
> +
> + if (memblock_free(start, size))
> + pr_err("Failed to free mem from %pa to %pa\n", , );

Why is this rather large structure passed in by value and not by 
reference?

Also, 'new_memmap' naming is confusing - by this time this is a rather 
old memmap table we are going to free, right? So naming it 'memmap' would 
be fine, right?

In a similar vein the 'old_memmap' function parameter in 
efi_memmap_insert() should probably be renamed to 'memmap' too, in a 
separate patch. Also, I find efi_memmap_insert() rather confusing to read 
- would it be possible to clean it up? Here's the various problems I can 
see:

 - 'new' is rather confusingly named, and because it's a void its exact 
   purpose and role is not clear.

 - m_start/end_attr could be renamed to mem_start/end/attr - there's very 
   little point in abbreviating that.

 - All around the names do not help readability. Why is the new range 
   being inserted just named 'mem'? Why is the descriptor table we insert 
   it into named old_memmap but the actual *new* descriptor table passed 
   in via an opaque, misleadingly named void pointer named 'buf'?

etc.

This function needs a lot of help to make it reviewable easily, before we 
complicate the EFI memory descriptor table code with freeing logic...

Thanks,

Ingo


[PATCH V2 3/3] x86/efi: Use efi_memmap_() to create runtime EFI memory map

2018-12-04 Thread Sai Praneeth Prakhya
efi_map_regions() uses realloc_pages() to allocate memory for runtime EFI
memory map (EFI memory map which contains only memory descriptors of type
Runtime Code/Data and Boot Code/Data). Since efi_memmap_alloc() also does
the same, use it instead of realloc_pages() and install the new EFI memory
map using efi_memmap_install() instead of efi_memmap_init_late(). This also
fixes the leaking of existing EFI memory map.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h |  2 +-
 arch/x86/platform/efi/efi.c| 93 +-
 arch/x86/platform/efi/efi_32.c |  2 +-
 arch/x86/platform/efi/efi_64.c |  7 ++-
 4 files changed, 33 insertions(+), 71 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 744f945a00e7..524fda68b03f 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -131,7 +131,7 @@ extern void __init efi_map_region(efi_memory_desc_t *md);
 extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
-extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
+extern int __init efi_setup_page_tables(void);
 extern void __init old_map_region(efi_memory_desc_t *md);
 extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 63885cc8e34e..1b0a9449096b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -656,27 +656,6 @@ static void __init get_systab_virt_addr(efi_memory_desc_t 
*md)
}
 }
 
-static void *realloc_pages(void *old_memmap, int old_shift)
-{
-   void *ret;
-
-   ret = (void *)__get_free_pages(GFP_KERNEL, old_shift + 1);
-   if (!ret)
-   goto out;
-
-   /*
-* A first-time allocation doesn't have anything to copy.
-*/
-   if (!old_memmap)
-   return ret;
-
-   memcpy(ret, old_memmap, PAGE_SIZE << old_shift);
-
-out:
-   free_pages((unsigned long)old_memmap, old_shift);
-   return ret;
-}
-
 /*
  * Iterate the EFI memory map in reverse order because the regions
  * will be mapped top-down. The end result is the same as if we had
@@ -782,18 +761,15 @@ static bool should_map_region(efi_memory_desc_t *md)
 }
 
 /*
- * Map the efi memory ranges of the runtime services and update new_mmap with
- * virtual addresses.
+ * Map the efi memory ranges of the runtime services and update memory map with
+ * virtual addresses. Returns number of memory map entries mapped.
  */
-static void * __init efi_map_regions(int *count, int *pg_shift)
+static int __init efi_map_regions(void)
 {
-   void *p, *new_memmap = NULL;
-   unsigned long left = 0;
-   unsigned long desc_size;
+   void *p;
+   int count = 0;
efi_memory_desc_t *md;
 
-   desc_size = efi.memmap.desc_size;
-
p = NULL;
while ((p = efi_map_next_entry(p))) {
md = p;
@@ -803,30 +779,15 @@ static void * __init efi_map_regions(int *count, int 
*pg_shift)
 
efi_map_region(md);
get_systab_virt_addr(md);
-
-   if (left < desc_size) {
-   new_memmap = realloc_pages(new_memmap, *pg_shift);
-   if (!new_memmap)
-   return NULL;
-
-   left += PAGE_SIZE << *pg_shift;
-   (*pg_shift)++;
-   }
-
-   memcpy(new_memmap + (*count * desc_size), md, desc_size);
-
-   left -= desc_size;
-   (*count)++;
+   count++;
}
-
-   return new_memmap;
+   return count;
 }
 
 static void __init kexec_enter_virtual_mode(void)
 {
 #ifdef CONFIG_KEXEC_CORE
efi_memory_desc_t *md;
-   unsigned int num_pages;
 
efi.systab = NULL;
 
@@ -872,10 +833,7 @@ static void __init kexec_enter_virtual_mode(void)
 
BUG_ON(!efi.systab);
 
-   num_pages = ALIGN(efi.memmap.nr_map * efi.memmap.desc_size, PAGE_SIZE);
-   num_pages >>= PAGE_SHIFT;
-
-   if (efi_setup_page_tables(efi.memmap.phys_map, num_pages)) {
+   if (efi_setup_page_tables()) {
clear_bit(EFI_RUNTIME_SERVICES, );
return;
}
@@ -926,10 +884,12 @@ static void __init kexec_enter_virtual_mode(void)
  */
 static void __init __efi_enter_virtual_mode(void)
 {
-   int count = 0, pg_shift = 0;
-   void *new_memmap = NULL;
+   struct efi_memory_map new_memmap;
+   efi_memory_desc_t *md;
+   int count = 0;
efi_status_t status;
unsigned long pa;
+   void *out;
 
efi.systab = NULL;
 
@@ -940,28 +900,25 @@ static void __init __efi_enter_virtual_mode(void)
}
 

[PATCH V2 1/3] efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free()

2018-12-04 Thread Sai Praneeth Prakhya
Presently, in EFI subsystem of kernel, every time kernel allocates memory
for a new EFI memory map, it forgets to free the memory occupied by old EFI
memory map. Hence, introduce efi_memmap_free() that frees up the memory
occupied by an EFI memory map.

Introduce __efi_memmap_unmap(), so that it could be used to unmap an EFI
memory map and have wrappers around it (namely efi_memmap_unmap() and
efi_memmap_unmap_and_free()) to specifically deal with efi.memmap. There
are two variants of wrappers (unmap and free) because there are use cases
where the kernel just needs to unmap the memory map (see efi_init() in arm
and kexec_enter_virtual_mode()) but not free it.

Apart from introducing the above functions, improve the cases where the
kernel decides to turn off EFI runtime services during boot by unmapping
and freeing the EFI memory map rather than just unmapping the EFI memory
map.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/efi.c |  4 +-
 arch/x86/platform/efi/quirks.c  |  2 +-
 drivers/firmware/efi/arm-init.c |  2 +-
 drivers/firmware/efi/memmap.c   | 72 +
 include/linux/efi.h |  1 +
 5 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index e1cb01a22fa8..715601d1c581 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -532,7 +532,7 @@ void __init efi_init(void)
pr_info("No EFI runtime due to 32/64-bit mismatch with 
kernel\n");
else {
if (efi_runtime_disabled() || efi_runtime_init()) {
-   efi_memmap_unmap();
+   efi_memmap_unmap_and_free();
return;
}
}
@@ -833,7 +833,7 @@ static void __init kexec_enter_virtual_mode(void)
 * have been mapped at these virtual addresses.
 */
if (!efi_is_native() || efi_enabled(EFI_OLD_MEMMAP)) {
-   efi_memmap_unmap();
+   efi_memmap_unmap_and_free();
clear_bit(EFI_RUNTIME_SERVICES, );
return;
}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 09e811b9da26..ce6dcd40dd6c 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -556,7 +556,7 @@ void __init efi_apply_memmap_quirks(void)
 */
if (!efi_runtime_supported()) {
pr_info("Setup done, disabling due to 32/64-bit mismatch\n");
-   efi_memmap_unmap();
+   efi_memmap_unmap_and_free();
}
 
/* UV2+ BIOS has a fix for this issue.  UV1 still needs the quirk. */
diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index 1a6a77df8a5e..f32ff5c580f6 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -253,7 +253,7 @@ void __init efi_init(void)
  efi.memmap.desc_version);
 
if (uefi_init() < 0) {
-   efi_memmap_unmap();
+   efi_memmap_unmap_and_free();
return;
}
 
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 38b686c67b17..4318a69bdbbf 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -49,6 +49,29 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
return __efi_memmap_alloc_early(size);
 }
 
+/**
+ * efi_memmap_free - Free memory pointed by new_memmap.map
+ * @new_memmap: Structure that describes EFI memory map.
+ *
+ * Memory is freed depending on the type of allocation performed.
+ */
+static void __init efi_memmap_free(struct efi_memory_map new_memmap)
+{
+   phys_addr_t start, end;
+   unsigned long size = new_memmap.nr_map * new_memmap.desc_size;
+   unsigned int order = get_order(size);
+
+   start = new_memmap.phys_map;
+   end = start + size;
+   if (new_memmap.late) {
+   __free_pages(pfn_to_page(PHYS_PFN(start)), order);
+   return;
+   }
+
+   if (memblock_free(start, size))
+   pr_err("Failed to free mem from %pa to %pa\n", , );
+}
+
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
@@ -116,21 +139,56 @@ int __init efi_memmap_init_early(struct 
efi_memory_map_data *data)
return __efi_memmap_init(data, false);
 }
 
+/**
+ * __efi_memmap_unmap - Unmap the region pointed by new_memmap.map
+ * @new_memmap: Structure that describes EFI memory map.
+ *
+ * Use to unmap *newly* created EFI memmap and should *not* be used directly to
+ * unmap efi.memmap because "EFI_MEMMAP" flag is not cleared here. Instead, use
+ * efi_memmap_unmap*() variants accordingly. Also, the check for "EFI_MEMMAP"
+ * flag is done in efi_memmap_unmap*() variants because they are the ones who
+ * unmap 

[PATCH V2 2/3] x86/efi: Fix EFI memory map leaks

2018-12-04 Thread Sai Praneeth Prakhya
Presently, in efi subsystem of kernel, every time kernel allocates memory
for a new EFI memory map, it forgets to free the memory occupied by the
existing EFI memory map. This could be fixed by unmapping and freeing the
existing EFI memory map every time before installing a new EFI memory map.
Hence, modify efi_memmap_install() accordingly since it's the only place
which installs a new EFI memory map.

Presently, efi_memmap_alloc() allocates only physical memory and every
caller of efi_memmap_alloc() should remap the newly allocated memory in
order to use it. This extra step could sometimes lead to buggy error
handling conditions where in the allocated memory isn't freed should remap
fail. So, push the remap logic into efi_memmap_alloc() so that the error
handling could be improved and it also makes the caller look simpler.

With the modified efi_memmap_alloc() and efi_memmap_install() API's, a
typical flow to install a new EFI memory map would look something like
below.

1. Get the number of entries the new EFI memory map should have (typically
   through efi_memmap_split_count()).
2. Allocate memory for the new EFI memory map (efi_memmap_alloc()).
3. Populate memory descriptor entries in the new EFI memory map.
4. Install the new EFI memory map (efi_memmap_install() which also unmaps
   and frees existing memory map).

Existing functions like efi_clean_memmap(), efi_arch_mem_reserve(),
efi_free_boot_services() and efi_fake_memmap() are modified to fix the
above mentioned bugs and also to follow the above recommended usage of
API's.

Note that efi_clean_memmap() could be implemented without allocating any
new memory, but since this is not fast path and hence is not a concern for
performance, readability and maintainability wins. So, change it to use
efi_memmap_alloc() and efi_memmap_install().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |   1 +
 arch/x86/kernel/setup.c |   6 ++
 arch/x86/platform/efi/efi.c |  44 ++--
 arch/x86/platform/efi/quirks.c  |  43 +++-
 drivers/firmware/efi/fake_mem.c |  21 ++
 drivers/firmware/efi/memmap.c   | 118 +++-
 include/linux/efi.h |   7 +-
 7 files changed, 132 insertions(+), 108 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index d1e64ac80b9c..744f945a00e7 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,6 +143,7 @@ extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
 extern void efi_free_boot_services(void);
 extern void efi_reserve_boot_services(void);
+extern void __init efi_clean_memmap(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b74e7bfed6ab..bed79b238b0d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1102,6 +1102,12 @@ void __init setup_arch(char **cmdline_p)
reserve_bios_regions();
 
if (efi_enabled(EFI_MEMMAP)) {
+   /*
+* efi_clean_memmap() uses memblock_phys_alloc() to allocate
+* memory for new EFI memmap and hence will work only after
+* e820__memblock_setup()
+*/
+   efi_clean_memmap();
efi_fake_memmap();
efi_find_mirror();
efi_esrt_init();
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 715601d1c581..63885cc8e34e 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -249,30 +249,36 @@ static bool __init efi_memmap_entry_valid(const 
efi_memory_desc_t *md, int i)
return false;
 }
 
-static void __init efi_clean_memmap(void)
+void __init efi_clean_memmap(void)
 {
-   efi_memory_desc_t *out = efi.memmap.map;
-   const efi_memory_desc_t *in = out;
-   const efi_memory_desc_t *end = efi.memmap.map_end;
-   int i, n_removal;
-
-   for (i = n_removal = 0; in < end; i++) {
-   if (efi_memmap_entry_valid(in, i)) {
-   if (out != in)
-   memcpy(out, in, efi.memmap.desc_size);
-   out = (void *)out + efi.memmap.desc_size;
-   } else {
+   void *out;
+   efi_memory_desc_t *md;
+   unsigned int i = 0, n_removal = 0;
+   struct efi_memory_map new_memmap;
+
+   for_each_efi_memory_desc(md) {
+   if (!efi_memmap_entry_valid(md, i))
n_removal++;
-   }
-   in = (void *)in + efi.memmap.desc_size;
}
 
-   if (n_removal > 0) {
-   u64 size = efi.memmap.nr_map - n_removal;
+   if (n_removal == 0)
+   return;
 
-   pr_warn("Removing %d invalid memory map entries.\n", n_removal);
- 

[PATCH V2 0/3] Fix EFI memory map leaks

2018-12-04 Thread Sai Praneeth Prakhya
Presently, in EFI subsystem of kernel, every time kernel allocates memory for a
new EFI memory map, it forgets to free the memory occupied by old EFI memory 
map.
It does clear the mappings though (using efi_memmap_unmap()), but forgets to
free up the memory. Also, there is another minor issue, where in the newly
allocated memory isn't freed, should remap fail.

The first issue is addressed by adding efi_memmap_free() to efi_memmap_install()
and the second issue is addressed by pushing the remap code into
efi_memmap_alloc() and there by handling the failure condition.

Memory allocated to EFI memory map is leaked in below functions and hence they
are modified to fix the issue. Functions that modify EFI memmap are:
1. efi_clean_memmap(),
2. efi_fake_memmap(),
3. efi_arch_mem_reserve(),
4. efi_free_boot_services(),
5. and __efi_enter_virtual_mode()

More detailed explanation:
--
A typical boot flow on EFI supported x86_64 machines might look something like
below
1. EFI memory map is passed by firmware to kernel.
2. Kernel does a memblock_reserve() on this memory
   (see efi_memblock_x86_reserve_range()).
3. This memory map is checked for invalid entries in efi_clean_memmap(). If any
   invalid entries are found, they are omitted from EFI memory map but the
   memory occupied by these invalid EFI memory descriptors isn't freed.
3. To further process this memory map (see efi_fake_memmap(), efi_bgrt_init()
   and efi_esrt_init()), kernel allocates memory using efi_memmap_alloc() and
   copies the processed memory map to newly allocated memory but it forgets to
   free memory occupied by old EFI memory map.
4. Further, in efi_map_regions() the EFI memory map is processed again to
   include only EFI memory descriptors of type Runtime Code/Data and Boot
   Code/Data. Again, memory is allocated for this new memory map through
   realloc_pages() and the old EFI memory map is not freed.
5. After SetVirtualAddressMap() is done, the EFI memory map is processed again
   to have only EFI memory descriptors of type Runtime Code/Data. Again, memory
   is allocated for this new memory map through efi_memmap_alloc() and the old
   EFI memory map is not freed.

Testing:

Tested with LUV on qemu-x86_64 and on my dev machine. Checked for unchanged boot
behavior i.e. shouldn't break any existing stuff. Built for arm, arm64 and ia64
and found no new warnings/errors. Would appreciate the effort if someone could
test on arm machines.

Although majority of the changes are made to drivers/firmware/efi/memmap.c file
(which is common across architectures), this bug is only limited to x86_64
machines and hence this patch set shouldn't effect any other architectures.

Notes:
--
1. This patch set is based on EFI tree's "next" branch [1].
2. This patch set is an outcome of the discussion at [2].

Changes from V1:

1. Drop passing around allocation type from efi_memmap_alloc(), instead change
   efi_memmap_alloc() such that it now returns a populated struct efi_memory_map
2. Drop fixing issues in efi_fake_memmap(), will be addressed in a separate 
patch.
3. Optimize efi_map_regions().

[1] git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
[2] https://lkml.org/lkml/2018/7/2/1095

Sai Praneeth Prakhya (3):
  efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free()
  x86/efi: Fix EFI memory map leaks
  x86/efi: Use efi_memmap_() to create runtime EFI memory
map

 arch/x86/include/asm/efi.h  |   3 +-
 arch/x86/kernel/setup.c |   6 +
 arch/x86/platform/efi/efi.c | 141 +---
 arch/x86/platform/efi/efi_32.c  |   2 +-
 arch/x86/platform/efi/efi_64.c  |   7 +-
 arch/x86/platform/efi/quirks.c  |  45 ++--
 drivers/firmware/efi/arm-init.c |   2 +-
 drivers/firmware/efi/fake_mem.c |  21 +---
 drivers/firmware/efi/memmap.c   | 190 +---
 include/linux/efi.h |   8 +-
 10 files changed, 235 insertions(+), 190 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 

-- 
2.19.1



Re: [PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-12-04 Thread Lendacky, Thomas
On 11/29/2018 09:37 PM, Dave Young wrote:
> + more people
> 
> On 11/29/18 at 04:09pm, Lianbo Jiang wrote:
>> When doing kexec_file_load, the first kernel needs to pass the e820
>> reserved ranges to the second kernel. But kernel can not exactly
>> match the e820 reserved ranges when walking through the iomem resources
>> with the descriptor 'IORES_DESC_NONE', because several e820 types(
>> e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
>> _TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
>> may pass these four types to the kdump kernel, that is not desired result.
>>
>> So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
>> for the iomem resources search interfaces. It is helpful to exactly
>> match the reserved resource ranges when walking through iomem resources.
>>
>> In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
>> these code originally related to the descriptor 'IORES_DESC_NONE' need to
>> be updated. Otherwise, it will be easily confused and also cause some
>> errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
>> descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
>> changed.
>>
>> Suggested-by: Dave Young 
>> Signed-off-by: Lianbo Jiang 
>> ---
>>  arch/ia64/kernel/efi.c |  4 
>>  arch/x86/kernel/e820.c |  2 +-
>>  arch/x86/mm/ioremap.c  | 13 -
>>  include/linux/ioport.h |  1 +
>>  kernel/resource.c  |  6 +++---
>>  5 files changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
>> index 8f106638913c..1841e9b4db30 100644
>> --- a/arch/ia64/kernel/efi.c
>> +++ b/arch/ia64/kernel/efi.c
>> @@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
>> *code_resource,
>>  break;
>>  
>>  case EFI_RESERVED_TYPE:
>> +name = "reserved";
> 
> Ingo updated X86 code to use "Reserved",  I think it would be good to do
> same for this case as well
> 
>> +desc = IORES_DESC_RESERVED;
>> +break;
>> +
>>  case EFI_RUNTIME_SERVICES_CODE:
>>  case EFI_RUNTIME_SERVICES_DATA:
>>  case EFI_ACPI_RECLAIM_MEMORY:
> 
> Originally, above 3 are all "reserved", so probably they all should be
> IORES_DESC_RESERVED.
> 
> Can any IA64 people to review this?
> 
>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>> index 50895c2f937d..57fafdafb860 100644
>> --- a/arch/x86/kernel/e820.c
>> +++ b/arch/x86/kernel/e820.c
>> @@ -1048,10 +1048,10 @@ static unsigned long __init 
>> e820_type_to_iores_desc(struct e820_entry *entry)
>>  case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
>>  case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
>>  case E820_TYPE_PRAM:return 
>> IORES_DESC_PERSISTENT_MEMORY_LEGACY;
>> +case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
>>  case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>>  case E820_TYPE_RAM: /* Fall-through: */
>>  case E820_TYPE_UNUSABLE:/* Fall-through: */
>> -case E820_TYPE_RESERVED:/* Fall-through: */
>>  default:return IORES_DESC_NONE;
>>  }
>>  }
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 5378d10f1d31..fea2ef99415d 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
>>  
>>  static int __ioremap_check_desc_other(struct resource *res)
>>  {
>> -return (res->desc != IORES_DESC_NONE);
>> +/*
>> + * But now, the 'E820_TYPE_RESERVED' type is converted to the new
>> + * descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
>> + * it has been changed. And the value of 'mem_flags.desc_other'
>> + * is equal to 'true' if we don't strengthen the condition in this
>> + * function, that is wrong. Because originally it is equal to
>> + * 'false' for the same reserved type.
>> + *
>> + * So, that would be nice to keep it the same as before.
>> + */
>> +return ((res->desc != IORES_DESC_NONE) &&
>> +(res->desc != IORES_DESC_RESERVED));
>>  }
> 
> Added Tom since he added the check function.  Is it possible to only
> check explict valid desc types instead of exclude IORES_DESC_NONE?

Sorry for the delay...

The original intent of the check was to map most memory as encrypted under
SEV if it was marked with a specific descriptor, since it was likely to
not be MMIO. I tried converting most things that mapped memory to memremap
vs ioremap, but ACPI was one area that I left alone and this check catches
the mapping of the ACPI tables. I suppose it's possible to change this to
check just for IORES_DESC_ACPI_* values, but I would have to do some
testing.

Thanks,

Re: [PATCH] pstore: Convert buf_lock to semaphore

2018-12-04 Thread Sebastian Andrzej Siewior
On 2018-12-04 09:23:13 [-0800], Kees Cook wrote:
> Okay, so, if kmsg_dump() uses rcu_read_lock(), that means efi-pstore
> can _never_ sleep, and it's nothing to do with pstore internals. :( I
> guess we just hard-code it, then? And efi-pstore should probably only
> attach to pstore if it has a nonblock implementation (and warn if one
> isn't available).

I was about to suggest that. I am not aware if anything else could use
efi_pstore_write() use that but otherwise you could hardcode it.

> -Kees
> 
Sebastian


Re: [PATCH] pstore: Convert buf_lock to semaphore

2018-12-04 Thread Kees Cook
On Tue, Dec 4, 2018 at 7:41 AM Sebastian Andrzej Siewior
 wrote:
>
> On 2018-11-30 14:47:36 [-0800], Kees Cook wrote:
> > diff --git a/drivers/firmware/efi/efi-pstore.c 
> > b/drivers/firmware/efi/efi-pstore.c
> > index cfe87b465819..0f7d97917197 100644
> > --- a/drivers/firmware/efi/efi-pstore.c
> > +++ b/drivers/firmware/efi/efi-pstore.c
> > @@ -259,8 +259,7 @@ static int efi_pstore_write(struct pstore_record 
> > *record)
> >   efi_name[i] = name[i];
> >
> >   ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES,
> > -   !pstore_cannot_block_path(record->reason),
> > -   record->size, record->psi->buf);
> > +   preemptible(), record->size, record->psi->buf);
>
> Well. Better I think.
> might_sleep() / preempt_count_equals() checks for preemptible() + 
> rcu_preempt_depth().
> kmsg_dump() starts with rcu_read_lock() which means with this patch applied I
> got:

Okay, so, if kmsg_dump() uses rcu_read_lock(), that means efi-pstore
can _never_ sleep, and it's nothing to do with pstore internals. :( I
guess we just hard-code it, then? And efi-pstore should probably only
attach to pstore if it has a nonblock implementation (and warn if one
isn't available).

-Kees

-- 
Kees Cook


Re: [PATCH] pstore: Convert buf_lock to semaphore

2018-12-04 Thread Kees Cook
On Tue, Dec 4, 2018 at 7:41 AM Sebastian Andrzej Siewior
 wrote:
>
> On 2018-11-30 14:47:36 [-0800], Kees Cook wrote:
> > diff --git a/drivers/firmware/efi/efi-pstore.c 
> > b/drivers/firmware/efi/efi-pstore.c
> > index cfe87b465819..0f7d97917197 100644
> > --- a/drivers/firmware/efi/efi-pstore.c
> > +++ b/drivers/firmware/efi/efi-pstore.c
> > @@ -259,8 +259,7 @@ static int efi_pstore_write(struct pstore_record 
> > *record)
> >   efi_name[i] = name[i];
> >
> >   ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES,
> > -   !pstore_cannot_block_path(record->reason),
> > -   record->size, record->psi->buf);
> > +   preemptible(), record->size, record->psi->buf);
>
> Well. Better I think.
> might_sleep() / preempt_count_equals() checks for preemptible() + 
> rcu_preempt_depth().
> kmsg_dump() starts with rcu_read_lock() which means with this patch applied I
> got:
>
> | BUG: sleeping function called from invalid context at 
> kernel/sched/completion.c:99
> | in_atomic(): 0, irqs_disabled(): 0, pid: 2286, name: sig-xstate-bum PC: 0 
> RCU: 1
> | Preemption disabled at:
> | [] __queue_work+0x95/0x440
> | CPU: 30 PID: 2286 Comm: sig-xstate-bum Tainted: G  D   
> 4.20.0-rc3+ #90
> | Call Trace:
> |  dump_stack+0x4f/0x6a
> |  ___might_sleep.cold.91+0xef/0x100
> |  __might_sleep+0x50/0x90
> |  wait_for_completion+0x32/0x130
> |  virt_efi_query_variable_info+0x14e/0x160
> |  efi_query_variable_store+0x51/0x1a0
> |  efivar_entry_set_safe+0xa3/0x1b0
> |  efi_pstore_write+0x110/0x140
> |  pstore_dump+0x114/0x320
> |  kmsg_dump+0xa4/0xd0
> |  oops_exit+0x7f/0x90
> |  oops_end+0x67/0xd0
> |  die+0x41/0x4a
> |  do_general_protection+0xc1/0x150
> |  general_protection+0x1e/0x30
> | RIP: 0010:__fpu__restore_sig+0x1c1/0x540
>
> just in case you wonder why both counter are zero and it still creates
> this backtrace.

Oh, hmm. That didn't show up in my testing. Any thoughts on a solution?

>
> >   if (record->reason == KMSG_DUMP_OOPS)
> >   efivar_run_worker();
> > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> > index 2387cb74f729..afdfd3687f94 100644
> > --- a/fs/pstore/platform.c
> > +++ b/fs/pstore/platform.c
> > @@ -400,23 +401,20 @@ static void pstore_dump(struct kmsg_dumper *dumper,
> >   unsigned long   total = 0;
> >   const char  *why;
> >   unsigned intpart = 1;
> > - unsigned long   flags = 0;
> > - int is_locked;
> >   int ret;
> >
> >   why = get_reason_str(reason);
> >
> > - if (pstore_cannot_block_path(reason)) {
> > - is_locked = spin_trylock_irqsave(>buf_lock, flags);
> > - if (!is_locked) {
> > - pr_err("pstore dump routine blocked in %s path, may 
> > corrupt error record\n"
> > -, in_nmi() ? "NMI" : why);
> > + if (down_trylock(>buf_lock)) {
> > + /* Failed to acquire lock: give up if we cannot wait. */
> > + if (pstore_cannot_wait(reason)) {
> > + pr_err("dump skipped in %s path: may corrupt error 
> > record\n",
> > + in_nmi() ? "NMI" : why);
> >   return;
> >   }
> > - } else {
> > - spin_lock_irqsave(>buf_lock, flags);
> > - is_locked = 1;
> > + down_interruptible(>buf_lock);
>
>  In function ‘pstore_dump’:
> fs/pstore/platform.c:393:3: warning: ignoring return value of 
> ‘down_interruptible’, declared with attribute warn_unused_result 
> [-Wunused-result]
>down_interruptible(>buf_lock);
>^
>

Thanks, yes. I've fixed this in the version in -next.

-Kees

-- 
Kees Cook


Re: [PATCH] pstore: Convert buf_lock to semaphore

2018-12-04 Thread Sebastian Andrzej Siewior
On 2018-11-30 14:47:36 [-0800], Kees Cook wrote:
> diff --git a/drivers/firmware/efi/efi-pstore.c 
> b/drivers/firmware/efi/efi-pstore.c
> index cfe87b465819..0f7d97917197 100644
> --- a/drivers/firmware/efi/efi-pstore.c
> +++ b/drivers/firmware/efi/efi-pstore.c
> @@ -259,8 +259,7 @@ static int efi_pstore_write(struct pstore_record *record)
>   efi_name[i] = name[i];
>  
>   ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES,
> -   !pstore_cannot_block_path(record->reason),
> -   record->size, record->psi->buf);
> +   preemptible(), record->size, record->psi->buf);

Well. Better I think.
might_sleep() / preempt_count_equals() checks for preemptible() + 
rcu_preempt_depth().
kmsg_dump() starts with rcu_read_lock() which means with this patch applied I
got:

| BUG: sleeping function called from invalid context at 
kernel/sched/completion.c:99
| in_atomic(): 0, irqs_disabled(): 0, pid: 2286, name: sig-xstate-bum PC: 0 
RCU: 1
| Preemption disabled at:
| [] __queue_work+0x95/0x440
| CPU: 30 PID: 2286 Comm: sig-xstate-bum Tainted: G  D   
4.20.0-rc3+ #90
| Call Trace:
|  dump_stack+0x4f/0x6a
|  ___might_sleep.cold.91+0xef/0x100
|  __might_sleep+0x50/0x90
|  wait_for_completion+0x32/0x130
|  virt_efi_query_variable_info+0x14e/0x160
|  efi_query_variable_store+0x51/0x1a0
|  efivar_entry_set_safe+0xa3/0x1b0
|  efi_pstore_write+0x110/0x140
|  pstore_dump+0x114/0x320
|  kmsg_dump+0xa4/0xd0
|  oops_exit+0x7f/0x90
|  oops_end+0x67/0xd0
|  die+0x41/0x4a
|  do_general_protection+0xc1/0x150
|  general_protection+0x1e/0x30
| RIP: 0010:__fpu__restore_sig+0x1c1/0x540

just in case you wonder why both counter are zero and it still creates
this backtrace.

>   if (record->reason == KMSG_DUMP_OOPS)
>   efivar_run_worker();
> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
> index 2387cb74f729..afdfd3687f94 100644
> --- a/fs/pstore/platform.c
> +++ b/fs/pstore/platform.c
> @@ -400,23 +401,20 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>   unsigned long   total = 0;
>   const char  *why;
>   unsigned intpart = 1;
> - unsigned long   flags = 0;
> - int is_locked;
>   int ret;
>  
>   why = get_reason_str(reason);
>  
> - if (pstore_cannot_block_path(reason)) {
> - is_locked = spin_trylock_irqsave(>buf_lock, flags);
> - if (!is_locked) {
> - pr_err("pstore dump routine blocked in %s path, may 
> corrupt error record\n"
> -, in_nmi() ? "NMI" : why);
> + if (down_trylock(>buf_lock)) {
> + /* Failed to acquire lock: give up if we cannot wait. */
> + if (pstore_cannot_wait(reason)) {
> + pr_err("dump skipped in %s path: may corrupt error 
> record\n",
> + in_nmi() ? "NMI" : why);
>   return;
>   }
> - } else {
> - spin_lock_irqsave(>buf_lock, flags);
> - is_locked = 1;
> + down_interruptible(>buf_lock);

 In function ‘pstore_dump’:
fs/pstore/platform.c:393:3: warning: ignoring return value of 
‘down_interruptible’, declared with attribute warn_unused_result 
[-Wunused-result]
   down_interruptible(>buf_lock);
   ^

>   }

Sebastian


Re: [tip:x86/fpu] x86/fpu: Don't export __kernel_fpu_{begin,end}()

2018-12-04 Thread Sebastian Andrzej Siewior
On 2018-12-03 23:08:41 [+0100], Borislav Petkov wrote:
> On Mon, Dec 03, 2018 at 10:12:19PM +0100, Ard Biesheuvel wrote:
> > > + * Using the FPU in hardirq is not allowed.
> > 
> > According to the documentation in x86/kernel/fpu/core.c, this is not
> > true. So which one is accurate?
> 
> I think you mean the irq from user mode... Yap, we do allow that.
> 
> Sebastian?

Do you refer to
| *   - by IRQ context code to potentially use the FPU
| * if it's unused.
  
? It is possible to use the FPU in IRQ context.
The FPU could be used in user-context surrounded by kernel_fpu_begin().
This only disables preemption so an IRQ could interrupt it. This IRQ
could then use the FPU or raise a SoftIRQ which would use it.
Therefore on x86 it is required to check with irq_fpu_usable() if the
FPU can be used. If the FPU can not be used, you have to implement
fallback code.

With the "restore FPU on return to userland" series we need to modify
the FPU in a few places. The softirq and preemption is disabled. I
didn't find any in-IRQ users.
Going forward I would like to remove the in-IRQ part and
irq_fpu_usable() and disable softirq as part of kernel_fpu_begin().

> Thx.

Sebastian


[tip:x86/fpu] x86/fpu: Don't export __kernel_fpu_{begin,end}()

2018-12-04 Thread tip-bot for Sebastian Andrzej Siewior
Commit-ID:  12209993e98c5fa1855c467f22a24e3d5b8be205
Gitweb: https://git.kernel.org/tip/12209993e98c5fa1855c467f22a24e3d5b8be205
Author: Sebastian Andrzej Siewior 
AuthorDate: Thu, 29 Nov 2018 16:02:10 +0100
Committer:  Borislav Petkov 
CommitDate: Tue, 4 Dec 2018 12:37:28 +0100

x86/fpu: Don't export __kernel_fpu_{begin,end}()

There is one user of __kernel_fpu_begin() and before invoking it,
it invokes preempt_disable(). So it could invoke kernel_fpu_begin()
right away. The 32bit version of arch_efi_call_virt_setup() and
arch_efi_call_virt_teardown() does this already.

The comment above *kernel_fpu*() claims that before invoking
__kernel_fpu_begin() preemption should be disabled and that KVM is a
good example of doing it. Well, KVM doesn't do that since commit

  f775b13eedee2 ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")

so it is not an example anymore.

With EFI gone as the last user of __kernel_fpu_{begin|end}(), both can
be made static and not exported anymore.

Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Borislav Petkov 
Reviewed-by: Rik van Riel 
Cc: "H. Peter Anvin" 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Dave Hansen 
Cc: Ingo Molnar 
Cc: Nicolai Stange 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: kvm ML 
Cc: linux-efi 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20181129150210.2k4mawt37ow6c...@linutronix.de
---
 arch/x86/include/asm/efi.h |  6 ++
 arch/x86/include/asm/fpu/api.h | 15 +--
 arch/x86/kernel/fpu/core.c |  6 ++
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..45864898f7e5 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -82,8 +82,7 @@ struct efi_scratch {
 #define arch_efi_call_virt_setup() \
 ({ \
efi_sync_low_kernel_mappings(); \
-   preempt_disable();  \
-   __kernel_fpu_begin();   \
+   kernel_fpu_begin(); \
firmware_restrict_branch_speculation_start();   \
\
if (!efi_enabled(EFI_OLD_MEMMAP))   \
@@ -99,8 +98,7 @@ struct efi_scratch {
efi_switch_mm(efi_scratch.prev_mm); \
\
firmware_restrict_branch_speculation_end(); \
-   __kernel_fpu_end(); \
-   preempt_enable();   \
+   kernel_fpu_end();   \
 })
 
 extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index a9caac9d4a72..b56d504af654 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,17 +12,12 @@
 #define _ASM_X86_FPU_API_H
 
 /*
- * Careful: __kernel_fpu_begin/end() must be called with preempt disabled
- * and they don't touch the preempt state on their own.
- * If you enable preemption after __kernel_fpu_begin(), preempt notifier
- * should call the __kernel_fpu_end() to prevent the kernel/user FPU
- * state from getting corrupted. KVM for example uses this model.
- *
- * All other cases use kernel_fpu_begin/end() which disable preemption
- * during kernel FPU usage.
+ * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
+ * disables preemption so be careful if you intend to use it for long periods
+ * of time.
+ * If you intend to use the FPU in softirq you need to check first with
+ * irq_fpu_usable() if it is possible.
  */
-extern void __kernel_fpu_begin(void);
-extern void __kernel_fpu_end(void);
 extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 extern bool irq_fpu_usable(void);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2ea85b32421a..2e5003fef51a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -93,7 +93,7 @@ bool irq_fpu_usable(void)
 }
 EXPORT_SYMBOL(irq_fpu_usable);
 
-void __kernel_fpu_begin(void)
+static void __kernel_fpu_begin(void)
 {
struct fpu *fpu = >thread.fpu;
 
@@ -111,9 +111,8 @@ void __kernel_fpu_begin(void)
__cpu_invalidate_fpregs_state();
}
 }
-EXPORT_SYMBOL(__kernel_fpu_begin);
 
-void __kernel_fpu_end(void)
+static void __kernel_fpu_end(void)
 {
struct fpu *fpu = >thread.fpu;
 
@@ -122,7 +121,6 @@ void __kernel_fpu_end(void)
 
kernel_fpu_enable();
 }
-EXPORT_SYMBOL(__kernel_fpu_end);
 
 void 

Re: [tip:x86/fpu] x86/fpu: Don't export __kernel_fpu_{begin,end}()

2018-12-04 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 11:08:41PM +0100, Borislav Petkov wrote:
> On Mon, Dec 03, 2018 at 10:12:19PM +0100, Ard Biesheuvel wrote:
> > > + * Using the FPU in hardirq is not allowed.
> > 
> > According to the documentation in x86/kernel/fpu/core.c, this is not
> > true. So which one is accurate?
> 
> I think you mean the irq from user mode... Yap, we do allow that.
> 
> Sebastian?

Ok, I've zapped that sentence from the comment until we clarify what
Sebastian meant.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


[PATCH 4.9 49/50] efi/libstub: Make file I/O chunking x86-specific

2018-12-04 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Ard Biesheuvel 

(commit b3879a4d3a31ef14265a52e8d941cf4b0f6627ae upstream)

The ARM decompressor is finicky when it comes to uninitialized variables
with local linkage, the reason being that it may relocate .text and .bss
independently when executing from ROM. This is only possible if all
references into .bss from .text are absolute, and this happens to be the
case for references emitted under -fpic to symbols with external linkage,
and so all .bss references must involve symbols with external linkage.

When building the ARM stub using clang, the initialized local variable
__chunk_size is optimized into a zero-initialized flag that indicates
whether chunking is in effect or not. This flag is therefore emitted into
.bss, which triggers the ARM decompressor's diagnostics, resulting in a
failed build.

Under UEFI, we never execute the decompressor from ROM, so the diagnostic
makes little sense here. But we can easily work around the issue by making
__chunk_size global instead.

However, given that the file I/O chunking that is controlled by the
__chunk_size variable is intended to work around known bugs on various
x86 implementations of UEFI, we can simply make the chunking an x86
specific feature. This is an improvement by itself, and also removes the
need to parse the efi= options in the stub entirely.

Tested-by: Arnd Bergmann 
Signed-off-by: Ard Biesheuvel 
Reviewed-by: Matt Fleming 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-efi@vger.kernel.org
Link: 
http://lkml.kernel.org/r/1486380166-31868-8-git-send-email-ard.biesheu...@linaro.org
[ Small readability edits. ]
Signed-off-by: Ingo Molnar 
Signed-off-by: Nick Desaulniers 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/firmware/efi/libstub/efi-stub-helper.c |   11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -356,6 +356,14 @@ efi_status_t efi_parse_options(char *cmd
char *str;
 
/*
+* Currently, the only efi= option we look for is 'nochunk', which
+* is intended to work around known issues on certain x86 UEFI
+* versions. So ignore for now on other architectures.
+*/
+   if (!IS_ENABLED(CONFIG_X86))
+   return EFI_SUCCESS;
+
+   /*
 * If no EFI parameters were specified on the cmdline we've got
 * nothing to do.
 */
@@ -528,7 +536,8 @@ efi_status_t handle_cmdline_files(efi_sy
size = files[j].size;
while (size) {
unsigned long chunksize;
-   if (size > __chunk_size)
+
+   if (IS_ENABLED(CONFIG_X86) && size > 
__chunk_size)
chunksize = __chunk_size;
else
chunksize = size;