[PATCH v15 03/16] s390, kexec_file: drop arch_kexec_mem_walk()

2018-09-27 Thread AKASHI Takahiro
Since s390 already knows where to locate buffers, calling
arch_kexec_mem_walk() has no sense. So we can just drop it as kbuf->mem
indicates this while all other architectures sets it to 0 initially.

This change is a preparatory work for the next patch, where all the
variant memory walks, either on system resource or memblock, will be
put in one common place so that it will satisfy all the architectures'
need.

Signed-off-by: AKASHI Takahiro 
Reviewed-by: Philipp Rudo 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Dave Young 
Cc: Vivek Goyal 
Cc: Baoquan He 
---
 arch/s390/kernel/machine_kexec_file.c | 10 --
 include/linux/kexec.h |  8 
 kernel/kexec_file.c   |  4 
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/s390/kernel/machine_kexec_file.c 
b/arch/s390/kernel/machine_kexec_file.c
index f413f57f8d20..32023b4f9dc0 100644
--- a/arch/s390/kernel/machine_kexec_file.c
+++ b/arch/s390/kernel/machine_kexec_file.c
@@ -134,16 +134,6 @@ int kexec_file_add_initrd(struct kimage *image, struct 
s390_load_data *data,
return ret;
 }
 
-/*
- * The kernel is loaded to a fixed location. Turn off kexec_locate_mem_hole
- * and provide kbuf->mem by hand.
- */
-int arch_kexec_walk_mem(struct kexec_buf *kbuf,
-   int (*func)(struct resource *, void *))
-{
-   return 1;
-}
-
 int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
 Elf_Shdr *section,
 const Elf_Shdr *relsec,
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 49ab758f4d91..f378cb786f1b 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -145,6 +145,14 @@ int kexec_image_probe_default(struct kimage *image, void 
*buf,
  unsigned long buf_len);
 int kexec_image_post_load_cleanup_default(struct kimage *image);
 
+/*
+ * If kexec_buf.mem is set to this value, kexec_locate_mem_hole()
+ * will try to allocate free memory. Arch may overwrite it.
+ */
+#ifndef KEXEC_BUF_MEM_UNKNOWN
+#define KEXEC_BUF_MEM_UNKNOWN 0
+#endif
+
 /**
  * struct kexec_buf - parameters for finding a place for a buffer in memory
  * @image: kexec image in which memory to search.
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 63c7ce1c0c3e..0fcaa86219d1 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -534,6 +534,10 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
 {
int ret;
 
+   /* Arch knows where to place */
+   if (kbuf->mem != KEXEC_BUF_MEM_UNKNOWN)
+   return 0;
+
ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
 
return ret == 1 ? 0 : -EADDRNOTAVAIL;
-- 
2.19.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v15 02/16] kexec_file: make kexec_image_post_load_cleanup_default() global

2018-09-27 Thread AKASHI Takahiro
Change this function from static to global so that arm64 can implement
its own arch_kimage_file_post_load_cleanup() later using
kexec_image_post_load_cleanup_default().

Signed-off-by: AKASHI Takahiro 
Acked-by: Dave Young 
Cc: Vivek Goyal 
Cc: Baoquan He 
---
 include/linux/kexec.h | 1 +
 kernel/kexec_file.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 9e4e638fb505..49ab758f4d91 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -143,6 +143,7 @@ extern const struct kexec_file_ops * const 
kexec_file_loaders[];
 
 int kexec_image_probe_default(struct kimage *image, void *buf,
  unsigned long buf_len);
+int kexec_image_post_load_cleanup_default(struct kimage *image);
 
 /**
  * struct kexec_buf - parameters for finding a place for a buffer in memory
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index c6a3b6851372..63c7ce1c0c3e 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -78,7 +78,7 @@ void * __weak arch_kexec_kernel_image_load(struct kimage 
*image)
return kexec_image_load_default(image);
 }
 
-static int kexec_image_post_load_cleanup_default(struct kimage *image)
+int kexec_image_post_load_cleanup_default(struct kimage *image)
 {
if (!image->fops || !image->fops->cleanup)
return 0;
-- 
2.19.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v15 04/16] powerpc, kexec_file: factor out memblock-based arch_kexec_walk_mem()

2018-09-27 Thread AKASHI Takahiro
Memblock list is another source for usable system memory layout.
So move powerpc's arch_kexec_walk_mem() to common code so that other
memblock-based architectures, particularly arm64, can also utilise it.
A moved function is now renamed to kexec_walk_memblock() and integrated
into kexec_locate_mem_hole(), which will now be usable for all
architectures with no need for overriding arch_kexec_walk_mem().

With this change, arch_kexec_walk_mem() need no longer be a weak function,
and was now renamed to kexec_walk_resources().

Since powerpc doesn't support kdump in its kexec_file_load(), the current
kexec_walk_memblock() won't work for kdump either in this form, this will
be fixed in the next patch.

Signed-off-by: AKASHI Takahiro 
Cc: "Eric W. Biederman" 
Acked-by: Dave Young 
Cc: Vivek Goyal 
Cc: Baoquan He 
Acked-by: James Morse 
---
 arch/powerpc/kernel/machine_kexec_file_64.c | 54 ---
 include/linux/kexec.h   |  2 -
 kernel/kexec_file.c | 60 +++--
 3 files changed, 57 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_file_64.c 
b/arch/powerpc/kernel/machine_kexec_file_64.c
index c77e95e9b384..0d20c7ad40fa 100644
--- a/arch/powerpc/kernel/machine_kexec_file_64.c
+++ b/arch/powerpc/kernel/machine_kexec_file_64.c
@@ -24,7 +24,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -46,59 +45,6 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void 
*buf,
return kexec_image_probe_default(image, buf, buf_len);
 }
 
-/**
- * arch_kexec_walk_mem - call func(data) for each unreserved memory block
- * @kbuf:  Context info for the search. Also passed to @func.
- * @func:  Function to call for each memory block.
- *
- * This function is used by kexec_add_buffer and kexec_locate_mem_hole
- * to find unreserved memory to load kexec segments into.
- *
- * Return: The memory walk will stop when func returns a non-zero value
- * and that value will be returned. If all free regions are visited without
- * func returning non-zero, then zero will be returned.
- */
-int arch_kexec_walk_mem(struct kexec_buf *kbuf,
-   int (*func)(struct resource *, void *))
-{
-   int ret = 0;
-   u64 i;
-   phys_addr_t mstart, mend;
-   struct resource res = { };
-
-   if (kbuf->top_down) {
-   for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0,
-   &mstart, &mend, NULL) {
-   /*
-* In memblock, end points to the first byte after the
-* range while in kexec, end points to the last byte
-* in the range.
-*/
-   res.start = mstart;
-   res.end = mend - 1;
-   ret = func(&res, kbuf);
-   if (ret)
-   break;
-   }
-   } else {
-   for_each_free_mem_range(i, NUMA_NO_NODE, 0, &mstart, &mend,
-   NULL) {
-   /*
-* In memblock, end points to the first byte after the
-* range while in kexec, end points to the last byte
-* in the range.
-*/
-   res.start = mstart;
-   res.end = mend - 1;
-   ret = func(&res, kbuf);
-   if (ret)
-   break;
-   }
-   }
-
-   return ret;
-}
-
 /**
  * setup_purgatory - initialize the purgatory's global variables
  * @image: kexec image.
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index f378cb786f1b..d58d1f2fab10 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -192,8 +192,6 @@ int __weak arch_kexec_apply_relocations(struct 
purgatory_info *pi,
const Elf_Shdr *relsec,
const Elf_Shdr *symtab);
 
-int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
-  int (*func)(struct resource *, void *));
 extern int kexec_add_buffer(struct kexec_buf *kbuf);
 int kexec_locate_mem_hole(struct kexec_buf *kbuf);
 
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 0fcaa86219d1..370d7eab49fe 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -501,6 +502,55 @@ static int locate_mem_hole_callback(struct resource *res, 
void *arg)
return locate_mem_hole_bottom_up(start, end, kbuf);
 }
 
+#if defined(CONFIG_HAVE_MEMBLOCK) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK)
+static int kexec_walk_memblock(struct kexec_buf *kbuf,
+  int (*func)(struct resource *, void *))
+{
+   int ret = 

[PATCH v15 00/16] arm64: kexec: add kexec_file_load() support

2018-09-27 Thread AKASHI Takahiro
This is the fifteenth round of implementing kexec_file_load() support
on arm64.[1] (See "Changes" below)
Most of the code is based on kexec-tools.

# Since v15, we need a few prerequisite patches; See "Changes."
#  You will find them in [1], too.


This patch series enables us to
  * load the kernel by specifying its file descriptor, instead of user-
filled buffer, at kexec_file_load() system call, and
  * optionally verify its signature at load time for trusted boot.
Kernel virtual address randomization is also supported since v9.

Contrary to kexec_load() system call, as we discussed a long time ago,
users may not be allowed to provide a device tree to the 2nd kernel
explicitly, hence enforcing a dt blob of the first kernel to be re-used
internally.

To use kexec_file_load() system call, instead of kexec_load(), at kexec
command, '-s' option must be specified. See [2] for a necessary patch for
kexec-tools.

To analyze a generated crash dump file, use the latest master branch of
crash utility[3]. I always try to submit patches to fix any inconsistencies
introduced in the latest kernel.

Regarding a kernel image verification, a signature must be presented
along with the binary itself. A signature is basically a hash value
calculated against the whole binary data and encrypted by a key which
will be authenticated by one of the system's trusted certificates.
Any attempt to read and load a to-be-kexec-ed kernel image through
a system call will be checked and blocked if the binary's hash value
doesn't match its associated signature.

There are two methods available now:
1. implementing arch-specific verification hook of kexec_file_load()
2. utilizing IMA(Integrity Measurement Architecture)[4] appraisal framework

Before my v7, I believed that my patch only supports (1) but am now
confident that (2) comes free if IMA is enabled and properly configured.


(1) Arch-specific verification hook
If CONFIG_KEXEC_VERIFY_SIG is enabled, kexec_file_load() invokes an arch-
defined (and hence file-format-specific) hook function to check for the
validity of kernel binary.

On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.  

As in the case of UEFI applications, we can create a signed kernel image:
$ sbsign --key ${KEY} --cert ${CERT} Image

You may want to use certs/signing_key.pem, which is intended to be used
for module signing (CONFIG_MODULE_SIG), as ${KEY} and ${CERT} for test
purpose.


(2) IMA appraisal-based
IMA was first introduced in linux in order to meet TCG (Trusted Computing
Group) requirement that all the sensitive files be *measured* before
reading/executing them to detect any untrusted changes/modification.
Then appraisal feature, which allows us to ensure the integrity of
files and even prevent them from reading/executing, was added later.

Meanwhile, kexec_file_load() has been merged since v3.17 and evolved to
enable IMA-appraisal type verification by the commit b804defe4297 ("kexec:
replace call to copy_file_from_fd() with kernel version").

In this scheme, a signature will be stored in a extended file attribute,
"security.ima" while a decryption key is hold in a dedicated keyring,
".ima" or "_ima".  All the necessary process of verification is confined
in a secure API, kernel_read_file_from_fd(), called by kexec_file_load().

Please note that powerpc is one of the two architectures now
supporting KEXEC_FILE, and that it wishes to extend IMA,
where a signature may be appended to "vmlinux" file[5], like module
signing, instead of using an extended file attribute.

While IMA meant to be used with TPM (Trusted Platform Module) on secure
platform, IMA is still usable without TPM. Here is an example procedure
about how we can give it a try to run the feature using a self-signed
root ca for demo/test purposes:

 1) Generate needed keys and certificates, following "Generate trusted
keys" section in README of ima-evm-utils[6].

 2) Build the kernel with the following kernel configurations, specifying
"ima-local-ca.pem" for CONFIG_SYSTEM_TRUSTED_KEYS:
CONFIG_EXT4_FS_SECURITY
CONFIG_INTEGRITY_SIGNATURE
CONFIG_INTEGRITY_ASYMMETRIC_KEYS
CONFIG_INTEGRITY_TRUSTED_KEYRING
CONFIG_IMA
CONFIG_IMA_WRITE_POLICY
CONFIG_IMA_READ_POLICY
CONFIG_IMA_APPRAISE
CONFIG_IMA_APPRAISE_BOOTPARAM
CONFIG_SYSTEM_TRUSTED_KEYS
Please note that CONFIG_KEXEC_VERIFY_SIG is not, actually should
not be, enabled.

 3) Sign(label) a kernel image binary to be kexec-ed on target filesystem:
$ evmctl ima_sign --key /path/to/private_key.pem /your/Image

 4) Add a command line parameter and boot the kernel:
ima_appraise=enforce

 On live system,
 5) Set a security policy:
$ mount -t securityfs none /sys/kernel/security
$ echo "appraise func=KEXEC_KERNEL_CHECK ap

[PATCH v15 01/16] asm-generic: add kexec_file_load system call to unistd.h

2018-09-27 Thread AKASHI Takahiro
The initial user of this system call number is arm64.

Signed-off-by: AKASHI Takahiro 
Acked-by: Arnd Bergmann 
---
 include/uapi/asm-generic/unistd.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/asm-generic/unistd.h 
b/include/uapi/asm-generic/unistd.h
index df4bedb9b01c..acea91e49523 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -736,9 +736,11 @@ __SYSCALL(__NR_statx, sys_statx)
 __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
 #define __NR_rseq 293
 __SYSCALL(__NR_rseq, sys_rseq)
+#define __NR_kexec_file_load 294
+__SYSCALL(__NR_kexec_file_load, sys_kexec_file_load)
 
 #undef __NR_syscalls
-#define __NR_syscalls 294
+#define __NR_syscalls 295
 
 /*
  * 32 bit systems traditionally used different
-- 
2.19.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue

2018-09-27 Thread lijiang


在 2018年09月27日 22:03, Bjorn Helgaas 写道:
> On Thu, Sep 27, 2018 at 01:27:41PM +0800, lijiang wrote:
>> 在 2018年09月25日 06:15, Bjorn Helgaas 写道:
>>> From: Bjorn Helgaas 
>>>
>>> Previously find_next_iomem_res() used "*res" as both an input parameter for
>>> the range to search and the type of resource to search for, and an output
>>> parameter for the resource we found, which makes the interface confusing
>>> and hard to use correctly.
>>>
>>> All callers allocate a single struct resource and use it for repeated calls
>>> to find_next_iomem_res().  When find_next_iomem_res() returns a resource,
>>> it overwrites the start, end, flags, and desc members of the struct.  If we
>>> call find_next_iomem_res() again, we must update or restore these fields.
>>>
>>> The callers (__walk_iomem_res_desc() and walk_system_ram_range()) do not
>>> restore res->flags, so if the caller is searching for flags of
>>> IORESOURCE_MEM | IORESOURCE_BUSY and finds a resource with flags of
>>> IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search will
>>> find only resources marked as IORESOURCE_SYSRAM.
>>>
>>> Fix this by restructuring the interface so it takes explicit "start, end,
>>> flags" parameters and uses "*res" only as an output parameter.
>>
>> Hi, Bjorn
>> I personally suggest that some comments might be added in the code, make it 
>> clear
>> and easy to understand, then which could avoid the old confusion and more 
>> code changes.
> 
> Since I think the current interface (using *res as both input and
> output parameters that have very different meanings) is confusing,
> it's hard for *me* to write comments that make it less confusing, but
> of course, you're welcome to propose something.
> 
> My opinion (probably not universally shared) is that my proposal would
> make the code more readable, and it's worth doing even though the diff
> is larger.
> 
> Anyway, I'll post these patches independently and see if anybody else
> has an opinion.
> 

I don't mind at all, because that is just my own opinion about more code
changes.

Anyway, thank you to improve this patch.

Regards,
Lianbo

> Bjorn
> 
>>> Original-patch: 
>>> http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
>>> Based-on-patch-by: Lianbo Jiang 
>>> Signed-off-by: Bjorn Helgaas 
>>> ---
>>>  kernel/resource.c |   94 
>>> +++--
>>>  1 file changed, 41 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/kernel/resource.c b/kernel/resource.c
>>> index 155ec873ea4d..9891ea90cc8f 100644
>>> --- a/kernel/resource.c
>>> +++ b/kernel/resource.c
>>> @@ -319,23 +319,26 @@ int release_resource(struct resource *old)
>>>  EXPORT_SYMBOL(release_resource);
>>>  
>>>  /*
>>> - * Finds the lowest iomem resource existing within [res->start..res->end].
>>> - * The caller must specify res->start, res->end, res->flags, and optionally
>>> - * desc.  If found, returns 0, res is overwritten, if not found, returns 
>>> -1.
>>> - * This function walks the whole tree and not just first level children 
>>> until
>>> - * and unless first_level_children_only is true.
>>> + * Finds the lowest iomem resource that covers part of [start..end].  The
>>> + * caller must specify start, end, flags, and desc (which may be
>>> + * IORES_DESC_NONE).
>>> + *
>>> + * If a resource is found, returns 0 and *res is overwritten with the part
>>> + * of the resource that's within [start..end]; if none is found, returns
>>> + * -1.
>>> + *
>>> + * This function walks the whole tree and not just first level children
>>> + * unless first_level_children_only is true.
>>>   */
>>> -static int find_next_iomem_res(struct resource *res, unsigned long desc,
>>> -  bool first_level_children_only)
>>> +static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>>> +  unsigned long flags, unsigned long desc,
>>> +  bool first_level_children_only,
>>> +  struct resource *res)
>>>  {
>>> -   resource_size_t start, end;
>>> struct resource *p;
>>> bool sibling_only = false;
>>>  
>>> BUG_ON(!res);
>>> -
>>> -   start = res->start;
>>> -   end = res->end;
>>> BUG_ON(start >= end);
>>>  
>>> if (first_level_children_only)
>>> @@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
>>> unsigned long desc,
>>> read_lock(&resource_lock);
>>>  
>>> for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
>>> -   if ((p->flags & res->flags) != res->flags)
>>> +   if ((p->flags & flags) != flags)
>>> continue;
>>> if ((desc != IORES_DESC_NONE) && (desc != p->desc))
>>> continue;
>>> @@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, 
>>> unsigned long desc,
>>> read_unlock(&resource_lock);
>>> if (!p)
>>> return -1;
>>> +
>>> /* copy data */
>>> -   if (res->start < 

Re: [PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-09-27 Thread lijiang
在 2018年09月28日 00:53, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 03:19:52PM +0800, Lianbo Jiang wrote:
>> When SME is enabled in the first kernel, we will allocate unencrypted pages
>> for kdump in order to be able to boot the kdump kernel like kexec.
> 
> This is not what the commit does - it marks the control pages as
> decrypted when SME. Why doesn't the commit message state that and why is
> this being done?
> 
Ok, i will improve this patch log.

If SME is active, we need to mark the control pages as decrypted, because
when we boot to the kdump kernel, these pages won't be accessed encrypted
at the initial stage, in order to boot the kdump kernel in the same manner
as originally booted.

>> Signed-off-by: Lianbo Jiang 
>> Reviewed-by: Tom Lendacky 
>> ---
>>  kernel/kexec_core.c | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 23a83a4da38a..e7efcd1a977b 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -471,6 +471,16 @@ static struct page 
>> *kimage_alloc_crash_control_pages(struct kimage *image,
>>  }
>>  }
>>  
>> +if (pages) {
>> +/*
>> + * For kdump, we need to ensure that these pages are
>> + * unencrypted pages if SME is enabled.
> 
> Remember to always call unencrypted pages "decrypted" - this is the
> convention we agreed upon and it should keep the confusion level at
> minimum to others staring at this code.
> 
Ok, i will improve this comment.

>> + * By the way, it is unnecessary to call the arch_
>> + * kexec_pre_free_pages(), which will make the code
>> + * become more simple.
>> + */
> 
> This second sentence I don't understand...
> 

There are two functions that are usually called in pairs, they are:
arch_kexec_post_alloc_pages() and arch_kexec_pre_free_pages().

One marks the pages as decrypted, another one marks the pages as encrypted.

But for the crash control pages, no need to call arch_kexec_pre_free_pages(),
there are three reasons:
1. Crash pages are reserved in memblock, these pages are only used by kdump,
   no other people uses these pages;

2. Whenever crash pages are allocated, these pages are always marked as
   decrypted(when SME is active);

3. If we plan to call the arch_kexe_pre_free_pages(), we have to store these
pages to somewhere, which will have more code changes.

So, here it is redundant to call the arch_kexe_pre_free_pages().

Thanks.
Lianbo
>> +arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
>> +}
>>  return pages;
>>  }
>>  
>> @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage 
>> *image,
>>  result  = -ENOMEM;
>>  goto out;
>>  }
>> +arch_kexec_post_alloc_pages(page_address(page), 1, 0);
>>  ptr = kmap(page);
>>  ptr += maddr & ~PAGE_MASK;
>>  mchunk = min_t(size_t, mbytes,
>> @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage 
>> *image,
>>  result = copy_from_user(ptr, buf, uchunk);
>>  kexec_flush_icache_page(page);
>>  kunmap(page);
>> +arch_kexec_pre_free_pages(page_address(page), 1);
>>  if (result) {
>>  result = -EFAULT;
>>  goto out;
>> -- 
>> 2.17.1
>>
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread lijiang
在 2018年09月28日 00:10, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 10:53:47PM +0800, lijiang wrote:
>> If no need to break this line, it will cause a warning of exceeding 80 
>> characters per line.
> 
> That's fine - we don't take the 80 cols rule blindly but apply common
> sense. In this particular case the lines can stick out because they're
> simply externs and are meant to be skimmed over and not really read. :)
> 
Ok, i see. Thanks.

>> Thank you for pointing out this issue, i forgot to remove this header file. 
>>
>> I will get rid of this header file and post this patch again.
> 
> No need - already fixed that up.
> 
Great, thank you so much for your help.

Regards,
Lianbo
> Thx.
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-09-27 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 03:19:52PM +0800, Lianbo Jiang wrote:
> When SME is enabled in the first kernel, we will allocate unencrypted pages
> for kdump in order to be able to boot the kdump kernel like kexec.

This is not what the commit does - it marks the control pages as
decrypted when SME. Why doesn't the commit message state that and why is
this being done?

> Signed-off-by: Lianbo Jiang 
> Reviewed-by: Tom Lendacky 
> ---
>  kernel/kexec_core.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 23a83a4da38a..e7efcd1a977b 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -471,6 +471,16 @@ static struct page 
> *kimage_alloc_crash_control_pages(struct kimage *image,
>   }
>   }
>  
> + if (pages) {
> + /*
> +  * For kdump, we need to ensure that these pages are
> +  * unencrypted pages if SME is enabled.

Remember to always call unencrypted pages "decrypted" - this is the
convention we agreed upon and it should keep the confusion level at
minimum to others staring at this code.

> +  * By the way, it is unnecessary to call the arch_
> +  * kexec_pre_free_pages(), which will make the code
> +  * become more simple.
> +  */

This second sentence I don't understand...

> + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
> + }
>   return pages;
>  }
>  
> @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
>   result  = -ENOMEM;
>   goto out;
>   }
> + arch_kexec_post_alloc_pages(page_address(page), 1, 0);
>   ptr = kmap(page);
>   ptr += maddr & ~PAGE_MASK;
>   mchunk = min_t(size_t, mbytes,
> @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
>   result = copy_from_user(ptr, buf, uchunk);
>   kexec_flush_icache_page(page);
>   kunmap(page);
> + arch_kexec_pre_free_pages(page_address(page), 1);
>   if (result) {
>   result = -EFAULT;
>   goto out;
> -- 
> 2.17.1
> 

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 10:53:47PM +0800, lijiang wrote:
> If no need to break this line, it will cause a warning of exceeding 80 
> characters per line.

That's fine - we don't take the 80 cols rule blindly but apply common
sense. In this particular case the lines can stick out because they're
simply externs and are meant to be skimmed over and not really read. :)

> Thank you for pointing out this issue, i forgot to remove this header file. 
> 
> I will get rid of this header file and post this patch again.

No need - already fixed that up.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread lijiang
在 2018年09月27日 21:17, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 03:19:51PM +0800, Lianbo Jiang wrote:
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index 6de64840dd22..f8795f9581c7 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t 
>> offset, unsigned long size);
>>  #define ioremap_cache ioremap_cache
>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>> size, unsigned long prot_val);
>>  #define ioremap_prot ioremap_prot
>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
>> +unsigned long size);
> 
> No need to break this line - see how the other externs don't.
> 
Ok, i will fix it. 

If no need to break this line, it will cause a warning of exceeding 80 
characters per line.

>> +#define ioremap_encrypted ioremap_encrypted
>>  
>>  /**
>>   * ioremap -   map bus memory into CPU space
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index c63a545ec199..e01e6c695add 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -24,6 +24,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> What is that include for and why is it not up there with the  includes instead here with the  ones?
> 
Thank you for pointing out this issue, i forgot to remove this header file. 

I will get rid of this header file and post this patch again.

Regards,
Lianbo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Previously find_next_iomem_res() used "*res" as both an input parameter for
the range to search and the type of resource to search for, and an output
parameter for the resource we found, which makes the interface confusing.

The current callers use find_next_iomem_res() incorrectly because they
allocate a single struct resource and use it for repeated calls to
find_next_iomem_res().  When find_next_iomem_res() returns a resource, it
overwrites the start, end, flags, and desc members of the struct.  If we
call find_next_iomem_res() again, we must update or restore these fields.
The previous code restored res.start and res.end, but not res.flags or
res.desc.

Since the callers did not restore res.flags, if they searched for flags
IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags
IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would
incorrectly skip resources unless they were also marked as
IORESOURCE_SYSRAM.

Fix this by restructuring the interface so it takes explicit "start, end,
flags" parameters and uses "*res" only as an output parameter.

Original-patch: 
http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
Based-on-patch-by: Lianbo Jiang 
Signed-off-by: Bjorn Helgaas 
---
 kernel/resource.c |   94 +++--
 1 file changed, 41 insertions(+), 53 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 155ec873ea4d..9891ea90cc8f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -319,23 +319,26 @@ int release_resource(struct resource *old)
 EXPORT_SYMBOL(release_resource);
 
 /*
- * Finds the lowest iomem resource existing within [res->start..res->end].
- * The caller must specify res->start, res->end, res->flags, and optionally
- * desc.  If found, returns 0, res is overwritten, if not found, returns -1.
- * This function walks the whole tree and not just first level children until
- * and unless first_level_children_only is true.
+ * Finds the lowest iomem resource that covers part of [start..end].  The
+ * caller must specify start, end, flags, and desc (which may be
+ * IORES_DESC_NONE).
+ *
+ * If a resource is found, returns 0 and *res is overwritten with the part
+ * of the resource that's within [start..end]; if none is found, returns
+ * -1.
+ *
+ * This function walks the whole tree and not just first level children
+ * unless first_level_children_only is true.
  */
-static int find_next_iomem_res(struct resource *res, unsigned long desc,
-  bool first_level_children_only)
+static int find_next_iomem_res(resource_size_t start, resource_size_t end,
+  unsigned long flags, unsigned long desc,
+  bool first_level_children_only,
+  struct resource *res)
 {
-   resource_size_t start, end;
struct resource *p;
bool sibling_only = false;
 
BUG_ON(!res);
-
-   start = res->start;
-   end = res->end;
BUG_ON(start >= end);
 
if (first_level_children_only)
@@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
read_lock(&resource_lock);
 
for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
-   if ((p->flags & res->flags) != res->flags)
+   if ((p->flags & flags) != flags)
continue;
if ((desc != IORES_DESC_NONE) && (desc != p->desc))
continue;
@@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
read_unlock(&resource_lock);
if (!p)
return -1;
+
/* copy data */
-   if (res->start < p->start)
-   res->start = p->start;
-   if (res->end > p->end)
-   res->end = p->end;
+   res->start = max(start, p->start);
+   res->end = min(end, p->end);
res->flags = p->flags;
res->desc = p->desc;
return 0;
 }
 
-static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
-bool first_level_children_only,
-void *arg,
+static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
+unsigned long flags, unsigned long desc,
+bool first_level_children_only, void *arg,
 int (*func)(struct resource *, void *))
 {
-   u64 orig_end = res->end;
+   struct resource res;
int ret = -1;
 
-   while ((res->start < res->end) &&
-  !find_next_iomem_res(res, desc, first_level_children_only)) {
-   ret = (*func)(res, arg);
+   while (start < end &&
+  !find_next_iomem_res(start, end, flags, desc,
+   first_level_children_only, &res)) {
+   ret = (*func

[PATCH 2/3] resource: Include resource end in walk_*() interfaces

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

find_next_iomem_res() finds an iomem resource that covers part of a range
described by "start, end".  All callers expect that range to be inclusive,
i.e., both start and end are included, but find_next_iomem_res() doesn't
handle the end address correctly.

If it finds an iomem resource that contains exactly the end address, it
skips it, e.g., if "start, end" is [0x0-0x1] and there happens to be an
iomem resource [mem 0x1-0x1] (the single byte at 0x1), we skip
it:

  find_next_iomem_res(...)
  {
start = 0x0;
end = 0x1;
for (p = next_resource(...)) {
  # p->start = 0x1;
  # p->end = 0x1;
  # we *should* return this resource, but this condition is false:
  if ((p->end >= start) && (p->start < end))
break;

Adjust find_next_iomem_res() so it allows a resource that includes the
single byte at the end of the range.  This is a corner case that we
probably don't see in practice.

Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch 
range fix")
Signed-off-by: Bjorn Helgaas 
---
 kernel/resource.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..155ec873ea4d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -319,7 +319,7 @@ int release_resource(struct resource *old)
 EXPORT_SYMBOL(release_resource);
 
 /*
- * Finds the lowest iomem resource existing within [res->start.res->end).
+ * Finds the lowest iomem resource existing within [res->start..res->end].
  * The caller must specify res->start, res->end, res->flags, and optionally
  * desc.  If found, returns 0, res is overwritten, if not found, returns -1.
  * This function walks the whole tree and not just first level children until
@@ -352,7 +352,7 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
p = NULL;
break;
}
-   if ((p->end >= start) && (p->start < end))
+   if ((p->end >= start) && (p->start <= end))
break;
}
 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/3] find_next_iomem_res() fixes

2018-09-27 Thread Bjorn Helgaas
These fix:

  - A kexec off-by-one error that we probably never see in practice (only
affects a resource starting at exactly 640KB).

  - A corner case in walk_iomem_res_desc(), walk_system_ram_res(), etc that
we probably also never see in practice (only affects a single byte
resource at the very end of the region we're searching)

  - An issue in walk_iomem_res_desc() that apparently causes a kdump issue
(see Lianbo's note at [1]).

I think we need to fix the kdump issue either by these patches
(specifically the last one) or by the patch Lianbo posted [2].

I'm hoping to avoid deciding and merging these myself, but I'm not
sure who really wants to own kernel/resource.c.

[1] https://lore.kernel.org/lkml/01551d06-c421-5df3-b19f-fc66f3639...@redhat.com
[2] https://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com

---

Bjorn Helgaas (3):
  x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
  resource: Include resource end in walk_*() interfaces
  resource: Fix find_next_iomem_res() iteration issue


 arch/x86/include/asm/kexec.h |2 -
 kernel/resource.c|   96 ++
 2 files changed, 43 insertions(+), 55 deletions(-)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/3] x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The only use of KEXEC_BACKUP_SRC_END is as an argument to
walk_system_ram_res():

  int crash_load_segments(struct kimage *image)
  {
...
walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
image, determine_backup_region);

walk_system_ram_res() expects "start, end" arguments that are inclusive,
i.e., the range to be walked includes both the start and end addresses.

KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the
first address *past* the desired 0-640KB range.

Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC
region is [0-0x9], not [0-0xa].

Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Bjorn Helgaas 
---
 arch/x86/include/asm/kexec.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index f327236f0fa7..5125fca472bb 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -67,7 +67,7 @@ struct kimage;
 
 /* Memory to backup during crash kdump */
 #define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL)  /* 640K */
+#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
 
 /*
  * CPU does not save ss and sp on stack if execution is already


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue

2018-09-27 Thread Bjorn Helgaas
On Thu, Sep 27, 2018 at 01:27:41PM +0800, lijiang wrote:
> 在 2018年09月25日 06:15, Bjorn Helgaas 写道:
> > From: Bjorn Helgaas 
> > 
> > Previously find_next_iomem_res() used "*res" as both an input parameter for
> > the range to search and the type of resource to search for, and an output
> > parameter for the resource we found, which makes the interface confusing
> > and hard to use correctly.
> > 
> > All callers allocate a single struct resource and use it for repeated calls
> > to find_next_iomem_res().  When find_next_iomem_res() returns a resource,
> > it overwrites the start, end, flags, and desc members of the struct.  If we
> > call find_next_iomem_res() again, we must update or restore these fields.
> > 
> > The callers (__walk_iomem_res_desc() and walk_system_ram_range()) do not
> > restore res->flags, so if the caller is searching for flags of
> > IORESOURCE_MEM | IORESOURCE_BUSY and finds a resource with flags of
> > IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search will
> > find only resources marked as IORESOURCE_SYSRAM.
> > 
> > Fix this by restructuring the interface so it takes explicit "start, end,
> > flags" parameters and uses "*res" only as an output parameter.
> 
> Hi, Bjorn
> I personally suggest that some comments might be added in the code, make it 
> clear
> and easy to understand, then which could avoid the old confusion and more 
> code changes.

Since I think the current interface (using *res as both input and
output parameters that have very different meanings) is confusing,
it's hard for *me* to write comments that make it less confusing, but
of course, you're welcome to propose something.

My opinion (probably not universally shared) is that my proposal would
make the code more readable, and it's worth doing even though the diff
is larger.

Anyway, I'll post these patches independently and see if anybody else
has an opinion.

Bjorn

> > Original-patch: 
> > http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
> > Based-on-patch-by: Lianbo Jiang 
> > Signed-off-by: Bjorn Helgaas 
> > ---
> >  kernel/resource.c |   94 
> > +++--
> >  1 file changed, 41 insertions(+), 53 deletions(-)
> > 
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 155ec873ea4d..9891ea90cc8f 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -319,23 +319,26 @@ int release_resource(struct resource *old)
> >  EXPORT_SYMBOL(release_resource);
> >  
> >  /*
> > - * Finds the lowest iomem resource existing within [res->start..res->end].
> > - * The caller must specify res->start, res->end, res->flags, and optionally
> > - * desc.  If found, returns 0, res is overwritten, if not found, returns 
> > -1.
> > - * This function walks the whole tree and not just first level children 
> > until
> > - * and unless first_level_children_only is true.
> > + * Finds the lowest iomem resource that covers part of [start..end].  The
> > + * caller must specify start, end, flags, and desc (which may be
> > + * IORES_DESC_NONE).
> > + *
> > + * If a resource is found, returns 0 and *res is overwritten with the part
> > + * of the resource that's within [start..end]; if none is found, returns
> > + * -1.
> > + *
> > + * This function walks the whole tree and not just first level children
> > + * unless first_level_children_only is true.
> >   */
> > -static int find_next_iomem_res(struct resource *res, unsigned long desc,
> > -  bool first_level_children_only)
> > +static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> > +  unsigned long flags, unsigned long desc,
> > +  bool first_level_children_only,
> > +  struct resource *res)
> >  {
> > -   resource_size_t start, end;
> > struct resource *p;
> > bool sibling_only = false;
> >  
> > BUG_ON(!res);
> > -
> > -   start = res->start;
> > -   end = res->end;
> > BUG_ON(start >= end);
> >  
> > if (first_level_children_only)
> > @@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
> > unsigned long desc,
> > read_lock(&resource_lock);
> >  
> > for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
> > -   if ((p->flags & res->flags) != res->flags)
> > +   if ((p->flags & flags) != flags)
> > continue;
> > if ((desc != IORES_DESC_NONE) && (desc != p->desc))
> > continue;
> > @@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, 
> > unsigned long desc,
> > read_unlock(&resource_lock);
> > if (!p)
> > return -1;
> > +
> > /* copy data */
> > -   if (res->start < p->start)
> > -   res->start = p->start;
> > -   if (res->end > p->end)
> > -   res->end = p->end;
> > +   res->start = max(start, p->start);
> > +   res->end = min(end, p->end);
> > res->flags = p

Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 03:19:51PM +0800, Lianbo Jiang wrote:
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index 6de64840dd22..f8795f9581c7 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t 
> offset, unsigned long size);
>  #define ioremap_cache ioremap_cache
>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
> size, unsigned long prot_val);
>  #define ioremap_prot ioremap_prot
> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
> + unsigned long size);

No need to break this line - see how the other externs don't.

> +#define ioremap_encrypted ioremap_encrypted
>  
>  /**
>   * ioremap -   map bus memory into CPU space
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index c63a545ec199..e01e6c695add 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

What is that include for and why is it not up there with the  ones?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/boot: Fix kexec booting failure after SEV early boot support

2018-09-27 Thread Lendacky, Thomas
On 09/27/2018 07:38 AM, Kairui Song wrote:
> Commit 1958b5fc4010 ("x86/boot: Add early boot support when running
> with SEV active") is causing kexec becomes sometimes unstable even if
> SEV is not active. kexec reboot won't start a second kernel bypassing
> BIOS boot process, instead, the system got reset.
> 
> That's because, in get_sev_encryption_bit function, we are using
> 32-bit RIP-relative addressing to read the value of enc_bit, but
> kexec may alloc the early boot up code to a higher location, which
> is beyond 32-bit addressing limit. Some garbage will be read and
> get_sev_encryption_bit will return the wrong value, which leads to
> wrong memory page flag.
> 
> This patch removes the use of enc_bit, as currently, enc_bit's only
> purpose is to avoid duplicated encryption bit reading, but the overhead
> of reading encryption bit is so tiny, so no need to cache that.
> 
> Fixes: 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV 
> active")
> Suggested-by: Borislav Petkov 
> Signed-off-by: Kairui Song 

Reviewed-by: Tom Lendacky 

> ---
>  arch/x86/boot/compressed/mem_encrypt.S | 19 ---
>  1 file changed, 19 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
> b/arch/x86/boot/compressed/mem_encrypt.S
> index eaa843a52907..a480356e0ed8 100644
> --- a/arch/x86/boot/compressed/mem_encrypt.S
> +++ b/arch/x86/boot/compressed/mem_encrypt.S
> @@ -25,20 +25,6 @@ ENTRY(get_sev_encryption_bit)
>   push%ebx
>   push%ecx
>   push%edx
> - push%edi
> -
> - /*
> -  * RIP-relative addressing is needed to access the encryption bit
> -  * variable. Since we are running in 32-bit mode we need this call/pop
> -  * sequence to get the proper relative addressing.
> -  */
> - call1f
> -1:   popl%edi
> - subl$1b, %edi
> -
> - movlenc_bit(%edi), %eax
> - cmpl$0, %eax
> - jge .Lsev_exit
>  
>   /* Check if running under a hypervisor */
>   movl$1, %eax
> @@ -69,15 +55,12 @@ ENTRY(get_sev_encryption_bit)
>  
>   movl%ebx, %eax
>   andl$0x3f, %eax /* Return the encryption bit location */
> - movl%eax, enc_bit(%edi)
>   jmp .Lsev_exit
>  
>  .Lno_sev:
>   xor %eax, %eax
> - movl%eax, enc_bit(%edi)
>  
>  .Lsev_exit:
> - pop %edi
>   pop %edx
>   pop %ecx
>   pop %ebx
> @@ -113,8 +96,6 @@ ENTRY(set_sev_encryption_mask)
>  ENDPROC(set_sev_encryption_mask)
>  
>   .data
> -enc_bit:
> - .int0x
>  
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>   .balign 8
> 
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] x86/boot: Fix kexec booting failure after SEV early boot support

2018-09-27 Thread Kairui Song
Commit 1958b5fc4010 ("x86/boot: Add early boot support when running
with SEV active") is causing kexec becomes sometimes unstable even if
SEV is not active. kexec reboot won't start a second kernel bypassing
BIOS boot process, instead, the system got reset.

That's because, in get_sev_encryption_bit function, we are using
32-bit RIP-relative addressing to read the value of enc_bit, but
kexec may alloc the early boot up code to a higher location, which
is beyond 32-bit addressing limit. Some garbage will be read and
get_sev_encryption_bit will return the wrong value, which leads to
wrong memory page flag.

This patch removes the use of enc_bit, as currently, enc_bit's only
purpose is to avoid duplicated encryption bit reading, but the overhead
of reading encryption bit is so tiny, so no need to cache that.

Fixes: 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV 
active")
Suggested-by: Borislav Petkov 
Signed-off-by: Kairui Song 
---
 arch/x86/boot/compressed/mem_encrypt.S | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index eaa843a52907..a480356e0ed8 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -25,20 +25,6 @@ ENTRY(get_sev_encryption_bit)
push%ebx
push%ecx
push%edx
-   push%edi
-
-   /*
-* RIP-relative addressing is needed to access the encryption bit
-* variable. Since we are running in 32-bit mode we need this call/pop
-* sequence to get the proper relative addressing.
-*/
-   call1f
-1: popl%edi
-   subl$1b, %edi
-
-   movlenc_bit(%edi), %eax
-   cmpl$0, %eax
-   jge .Lsev_exit
 
/* Check if running under a hypervisor */
movl$1, %eax
@@ -69,15 +55,12 @@ ENTRY(get_sev_encryption_bit)
 
movl%ebx, %eax
andl$0x3f, %eax /* Return the encryption bit location */
-   movl%eax, enc_bit(%edi)
jmp .Lsev_exit
 
 .Lno_sev:
xor %eax, %eax
-   movl%eax, enc_bit(%edi)
 
 .Lsev_exit:
-   pop %edi
pop %edx
pop %ecx
pop %ebx
@@ -113,8 +96,6 @@ ENTRY(set_sev_encryption_mask)
 ENDPROC(set_sev_encryption_mask)
 
.data
-enc_bit:
-   .int0x
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
.balign 8
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 2/6] ima: prevent kexec_load syscall based on runtime secureboot flag

2018-09-27 Thread Mimi Zohar
[Cc'ing the kexec mailing list, and Seth]

On Wed, 2018-09-26 at 17:52 +0530, Nayna Jain wrote:
> When CONFIG_KEXEC_VERIFY_SIG is enabled, the kexec_file_load syscall
> requires the kexec'd kernel image to be signed. Distros are concerned
> about totally disabling the kexec_load syscall. As a compromise, the
> kexec_load syscall will only be disabled when CONFIG_KEXEC_VERIFY_SIG
> is configured and the system is booted with secureboot enabled.
> 
> This patch disables the kexec_load syscall only for systems booted with
> secureboot enabled.
> 
> Signed-off-by: Nayna Jain 

Nice!

Mimi

> ---
>  security/integrity/ima/ima_main.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/security/integrity/ima/ima_main.c 
> b/security/integrity/ima/ima_main.c
> index dce0a8a217bb..bdb6e5563d05 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -505,20 +505,24 @@ int ima_post_read_file(struct file *file, void *buf, 
> loff_t size,
>   */
>  int ima_load_data(enum kernel_load_data_id id)
>  {
> - bool sig_enforce;
> + bool ima_enforce, sig_enforce;
>  
> - if ((ima_appraise & IMA_APPRAISE_ENFORCE) != IMA_APPRAISE_ENFORCE)
> - return 0;
> + ima_enforce =
> + (ima_appraise & IMA_APPRAISE_ENFORCE) == IMA_APPRAISE_ENFORCE;
>  
>   switch (id) {
>   case LOADING_KEXEC_IMAGE:
> - if (ima_appraise & IMA_APPRAISE_KEXEC) {
> +#ifdef CONFIG_KEXEC_VERIFY_SIG
> + if (arch_ima_get_secureboot())
> + return -EACCES;
> +#endif
> + if (ima_enforce && (ima_appraise & IMA_APPRAISE_KEXEC)) {
>   pr_err("impossible to appraise a kernel image without a 
> file descriptor; try using kexec_file_load syscall.\n");
>   return -EACCES; /* INTEGRITY_UNKNOWN */
>   }
>   break;
>   case LOADING_FIRMWARE:
> - if (ima_appraise & IMA_APPRAISE_FIRMWARE) {
> + if (ima_enforce && (ima_appraise & IMA_APPRAISE_FIRMWARE)) {
>   pr_err("Prevent firmware sysfs fallback loading.\n");
>   return -EACCES; /* INTEGRITY_UNKNOWN */
>   }
> @@ -526,7 +530,8 @@ int ima_load_data(enum kernel_load_data_id id)
>   case LOADING_MODULE:
>   sig_enforce = is_module_sig_enforced();
>  
> - if (!sig_enforce && (ima_appraise & IMA_APPRAISE_MODULES)) {
> + if (ima_enforce && (!sig_enforce
> + && (ima_appraise & IMA_APPRAISE_MODULES))) {
>   pr_err("impossible to appraise a module without a file 
> descriptor. sig_enforce kernel parameter might help\n");
>   return -EACCES; /* INTEGRITY_UNKNOWN */
>   }


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 1/6] x86/ima: define arch_ima_get_secureboot

2018-09-27 Thread Mimi Zohar
[Cc'ing the kexec mailing list, and Seth]

On Wed, 2018-09-26 at 17:52 +0530, Nayna Jain wrote:
> Distros are concerned about totally disabling the kexec_load syscall.
> As a compromise, the kexec_load syscall will only be disabled when
> CONFIG_KEXEC_VERIFY_SIG is configured and the system is booted with
> secureboot enabled.
> 
> This patch defines the new arch specific function called
> arch_ima_get_secureboot() to retrieve the secureboot state of the system.
> 
> Signed-off-by: Nayna Jain 
> Suggested-by: Seth Forshee 

Nice!

Mimi

> ---
>  arch/x86/kernel/Makefile   |  2 ++
>  arch/x86/kernel/ima_arch.c | 17 +
>  include/linux/ima.h|  9 +
>  3 files changed, 28 insertions(+)
>  create mode 100644 arch/x86/kernel/ima_arch.c
> 
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 02d6f5cf4e70..f32406e51424 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -149,3 +149,5 @@ ifeq ($(CONFIG_X86_64),y)
>   obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
>   obj-y   += vsmp_64.o
>  endif
> +
> +obj-$(CONFIG_IMA)+= ima_arch.o
> diff --git a/arch/x86/kernel/ima_arch.c b/arch/x86/kernel/ima_arch.c
> new file mode 100644
> index ..bb5a88d2b271
> --- /dev/null
> +++ b/arch/x86/kernel/ima_arch.c
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Copyright (C) 2018 IBM Corporation
> + */
> +#include 
> +#include 
> +
> +extern struct boot_params boot_params;
> +
> +bool arch_ima_get_secureboot(void)
> +{
> + if (efi_enabled(EFI_BOOT) &&
> + (boot_params.secure_boot == efi_secureboot_mode_enabled))
> + return true;
> + else
> + return false;
> +}
> diff --git a/include/linux/ima.h b/include/linux/ima.h
> index 84806b54b50a..4852255aa4f4 100644
> --- a/include/linux/ima.h
> +++ b/include/linux/ima.h
> @@ -30,6 +30,15 @@ extern void ima_post_path_mknod(struct dentry *dentry);
>  extern void ima_add_kexec_buffer(struct kimage *image);
>  #endif
>  
> +#ifdef CONFIG_X86
> +extern bool arch_ima_get_secureboot(void);
> +#else
> +static inline bool arch_ima_get_secureboot(void)
> +{
> + return false;
> +}
> +#endif
> +
>  #else
>  static inline int ima_bprm_check(struct linux_binprm *bprm)
>  {


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v7 RESEND 4/4] kdump/vmcore: support encrypted old memory with SME enabled

2018-09-27 Thread Lianbo Jiang
In kdump kernel, we need to dump the old memory into vmcore file,if SME
is enabled in the first kernel, we have to remap the old memory with the
memory encryption mask, which will be automatically decrypted when we
read from DRAM.

For SME kdump, there are two cases that doesn't support:

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

1. SME is enabled in the first kernel, but SME is disabled in kdump kernel
In this case, because the old memory is encrypted, we can't decrypt the
old memory.

2. SME is disabled in the first kernel, but SME is enabled in kdump kernel
On the one hand, the old memory is unencrypted, the old memory can be dumped
as usual, we don't need to enable SME in kdump kernel; On the other hand, it
will increase the complexity of the code, we will have to consider how to
pass the SME flag from the first kernel to the kdump kernel, it is really
too expensive to do this.

This patches are only for SME kdump, the patches don't support SEV kdump.

Signed-off-by: Lianbo Jiang 
Reviewed-by: Tom Lendacky 
---
 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 21 +++
 include/linux/crash_dump.h   | 12 +++
 4 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8824d01c0c35..dfbeae0e35ce 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index ..e1b1a577f197
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ * Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cbde728f8ac6..3065c8bada6a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "internal.h"
 
 /* List representing chunks of contiguous memory areas and their offsets in
@@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
unsigned long pfn, offset;
   

[PATCH v7 RESEND 3/4] iommu/amd: Remap the device table of IOMMU with the memory encryption mask for kdump

2018-09-27 Thread Lianbo Jiang
In kdump kernel, it will copy the device table of IOMMU from the old device
table, which is encrypted when SME is enabled in the first kernel. So we
have to remap the old device table with the memory encryption mask.

Signed-off-by: Lianbo Jiang 
Reviewed-by: Tom Lendacky 
Acked-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu_init.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 84b3e4445d46..3931c7de7c69 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -902,12 +902,22 @@ static bool copy_device_table(void)
}
}
 
-   old_devtb_phys = entry & PAGE_MASK;
+   /*
+* When SME is enabled in the first kernel, the entry includes the
+* memory encryption mask(sme_me_mask), we must remove the memory
+* encryption mask to obtain the true physical address in kdump kernel.
+*/
+   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (sme_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-09-27 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate unencrypted pages
for kdump in order to be able to boot the kdump kernel like kexec.

Signed-off-by: Lianbo Jiang 
Reviewed-by: Tom Lendacky 
---
 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4da38a..e7efcd1a977b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   /*
+* For kdump, we need to ensure that these pages are
+* unencrypted pages if SME is enabled.
+* By the way, it is unnecessary to call the arch_
+* kexec_pre_free_pages(), which will make the code
+* become more simple.
+*/
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread Lianbo Jiang
When SME is enabled on AMD machine, the memory is encrypted in the first
kernel. In this case, SME also needs to be enabled in kdump kernel, and
we have to remap the old memory with the memory encryption mask.

Here we only talk about the case that SME is active in the first kernel,
and only care it's active too in kdump kernel. there are four cases we
need considered.

a. dump vmcore
   It is encrypted in the first kernel, and needs be read out in kdump
   kernel.

b. crash notes
   When dumping vmcore, the people usually need to read the useful
   information from notes, and the notes is also encrypted.

c. iommu device table
   It is allocated by kernel, need fill its pointer into mmio of amd iommu.
   It's encrypted in the first kernel, need read the old content to analyze
   and get useful information.

d. mmio of amd iommu
   Register reported by amd firmware, it's not RAM, we don't encrypt in
   both the first kernel and kdump kernel.

To achieve the goal, the solution is:
1. add a new bool parameter "encrypted" to __ioremap_caller()
   It is a low level function, and check the newly added parameter, if it's
   true and in kdump kernel, will remap the memory with sme mask.

2. add a new function ioremap_encrypted() to explicitly passed in a "true"
   value for "encrypted".
   For above a, b, c, we will call ioremap_encrypted();

3. adjust all existed ioremap wrapper functions, passed in "false" for
   encrypted to make them an before.

   ioremap_encrypted()\
   ioremap_cache() |
   ioremap_prot()  |
   ioremap_wt()|->__ioremap_caller()
   ioremap_wc()|
   ioremap_uc()|
   ioremap_nocache()  /

Signed-off-by: Lianbo Jiang 
Reviewed-by: Tom Lendacky 
---
 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de64840dd22..f8795f9581c7 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545ec199..e01e6c695add 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   __builtin_retur

[PATCH v7 RESEND 0/4] Support kdump for AMD secure memory encryption(SME)

2018-09-27 Thread Lianbo Jiang
When SME is enabled on AMD machine, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
to the kdump kernel for dumping data, and SME is also enabled in the kdump
kernel, otherwise the old memory can not be decrypted.

For the kdump, it is necessary to distinguish whether the memory is encrypted.
Furthermore, we should also know which part of the memory is encrypted or
decrypted. We will appropriately remap the memory according to the specific
situation in order to tell cpu how to access the memory.

As we know, a page of memory that is marked as encrypted, which will be
automatically decrypted when read from DRAM, and will also be automatically
encrypted when written to DRAM. If the old memory is encrypted, we have to
remap the old memory with the memory encryption mask, which will automatically
decrypt the old memory when we read those data.

For kdump(SME), there are two cases that doesn't support:

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

1. SME is enabled in the first kernel, but SME is disabled in kdump kernel
In this case, because the old memory is encrypted, we can't decrypt the
old memory.

2. SME is disabled in the first kernel, but SME is enabled in kdump kernel
It is unnecessary to support in this case, because the old memory is
unencrypted, the old memory can be dumped as usual, we don't need to enable
SME in kdump kernel. Another, If we must support the scenario, it will
increase the complexity of the code, we will have to consider how to pass
the SME flag from the first kernel to the kdump kernel, in order to let the
kdump kernel know that whether the old memory is encrypted.

There are two methods to pass the SME flag to the kdump kernel. The first
method is to modify the assembly code, which includes some common code and
the path is too long. The second method is to use kexec tool, which could
require the SME flag to be exported in the first kernel by "proc" or "sysfs",
kexec tools will read the SME flag from "proc" or "sysfs" when we use kexec
tools to load image, subsequently the SME flag will be saved in boot_params,
we can properly remap the old memory according to the previously saved SME
flag. But it is too expensive to do this.

This patches are only for SME kdump, the patches don't support SEV kdump.

Test tools:
makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
commit  "A draft for kdump vmcore about AMD SME"
Note: This patch can only dump vmcore in the case of SME enabled.

crash-7.2.3: https://github.com/crash-utility/crash.git
commit <001f77a05585> "Fix for Linux 4.19-rc1 and later kernels that contain 
kernel commit <7290d5809571>"

kexec-tools-2.0.17: 
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
commit  "kexec: fix for "Unhandled rela relocation: 
R_X86_64_PLT32" error"

Note:
Before you load the kernel and initramfs for kdump, this patch(
http://lists.infradead.org/pipermail/kexec/2018-September/021460.html) must be 
merged
to kexec-tools, and then the kdump kernel will work well. Because there is a 
patch
which is removed based on v6(x86/ioremap: strengthen the logic in
early_memremap_pgprot_adjust() to adjust encryption mask).

Test environment:
HP ProLiant DL385Gen10 AMD EPYC 7251
8-Core Processor
32768 MB memory
600 GB disk space

Linux 4.19-rc5:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit <6bf4ca7fbc85> "Linux 4.19-rc5"

Reference:
AMD64 Architecture Programmer's Manual
https://support.amd.com/TechDocs/24593.pdf

Changes since v6:
1. There is a patch which is removed based on v6.
(x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust 
encryption mask)
Dave Young suggests that this patch can be removed and fix the kexec-tools.
Reference: 
http://lists.infradead.org/pipermail/kexec/2018-September/021460.html)
2. Update the patch log.

Changes since v7:
1. Improve patch log for patch 1/4(Suggested by Baoquan He)
2. Add Reviewed-by for all patches(Tom Lendacky )
3. Add Acked-by for patch 3/4(Joerg Roedel )

Some known issues:
1. about SME
Upstream kernel will hang on HP machine(DL385Gen10 AMD EPYC 7251) when
we execute the kexec command as follow:

# kexec -l /boot/vmlinuz-4.19.0-rc5+ --initrd=/boot/initramfs-4.19.0-rc5+.img 
--command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on 
rd.lvm.lv=rhel_hp-dl385g10-03/root rd.lvm.lv=rhel_hp-dl385g10-03/swap 
console=ttyS0,115200n81 LANG=en_US.UTF-8 earlyprintk=serial debug nokaslr"
# kexec -e (or reboot)

But this issue can not be reproduced on speedway machine, and this iss