from:"Lianbo Jiang"

[PATCH v2] x86/efi: unconditionally hold the whole low-1MB memory regions

2021-05-31 Thread Lianbo Jiang

Some sub-1MB memory regions may be reserved by EFI boot services, and the
memory regions will be released later in the efi_free_boot_services().

Currently, always reserve all sub-1MB memory regions when the crashkernel
option is specified, but unfortunately EFI boot services may have already
reserved some sub-1MB memory regions before the crash_reserve_low_1M() is
called, which makes that the crash_reserve_low_1M() only own the
remaining sub-1MB memory regions, not all sub-1MB memory regions, because,
subsequently EFI boot services will free its own sub-1MB memory regions.
Eventually, DMA will be able to allocate memory from the sub-1MB area and
cause the following error:

crash> kmem -s |grep invalid
kmem: dma-kmalloc-512: slab: d52c40001900 invalid freepointer: 
9403c0067300
kmem: dma-kmalloc-512: slab: d52c40001900 invalid freepointer: 
9403c0067300
crash> vtop 9403c0067300
VIRTUAL   PHYSICAL
9403c0067300  67300   --->The physical address falls into this range 
[0x00063000-0x0008efff]

kernel debugging log:
...
[0.008927] memblock_reserve: [0x0001-0x00013fff] 
efi_reserve_boot_services+0x85/0xd0
[0.008930] memblock_reserve: [0x00063000-0x0008efff] 
efi_reserve_boot_services+0x85/0xd0
...
[0.009425] memblock_reserve: [0x-0x000f] 
crash_reserve_low_1M+0x2c/0x49
...
[0.010586] Zone ranges:
[0.010587]   DMA  [mem 0x1000-0x00ff]
[0.010589]   DMA32[mem 0x0100-0x]
[0.010591]   Normal   [mem 0x0001-0x000c7fff]
[0.010593]   Device   empty
...
[8.814894] __memblock_free_late: [0x00063000-0x0008efff] 
efi_free_boot_services+0x14b/0x23b
[8.815793] __memblock_free_late: [0x0001-0x00013fff] 
efi_free_boot_services+0x14b/0x23b

To fix the above issues, let's hold the whole low-1M memory regions
unconditionally in the efi_free_boot_services().

Signed-off-by: Lianbo Jiang 
---
Background(copy from bhe's comment in the patch v1):

Kdump kernel also need go through real mode code path during bootup. It
is not different than normal kernel except that it skips the firmware
resetting. So kdump kernel needs low 1M as system RAM just as normal
kernel does. Here we reserve the whole low 1M with memblock_reserve()
to avoid any later kernel or driver data reside in this area. Otherwise,
we need dump the content of this area to vmcore. As we know, when crash
happened, the old memory of 1st kernel should be untouched until vmcore
dumping read out its content. Meanwhile, kdump kernel need reuse low 1M.
In the past, we used a back up region to copy out the low 1M area, and
map the back up region into the low 1M area in vmcore elf file. In
6f599d84231fd27 ("x86/kdump: Always reserve the low 1M when the crashkernel
option is specified"), we changed to lock the whole low 1M to avoid
writting any kernel data into, like this we can skip this area when
dumping vmcore.

Above is why we try to memblock reserve the whole low 1M. We don't want
to use it, just don't want anyone to use it in 1st kernel.


 arch/x86/platform/efi/quirks.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 7850111008a8..840b7e3b3d48 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -409,7 +410,7 @@ void __init efi_free_boot_services(void)
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
-   size_t rm_size;
+   unsigned long long end = start + size;
 
if (md->type != EFI_BOOT_SERVICES_CODE &&
md->type != EFI_BOOT_SERVICES_DATA) {
@@ -431,23 +432,20 @@ void __init efi_free_boot_services(void)
efi_unmap_pages(md);
 
/*
-* Nasty quirk: if all sub-1MB memory is used for boot
-* services, we can get here without having allocated the
-* real mode trampoline.  It's too late to hand boot services
-* memory back to the memblock allocator, so instead
-* try to manually allocate the trampoline if needed.
-*
-* I've seen this on a Dell XPS 13 9350 with firmware
-* 1.4.4 with SGX enabled booting Linux via Fedora 24's
-* grub2-efi on a hard disk.  (And no, I don't know why
-* this happened, but Linux should still try to boot rather
-* panicking early.)
+* The sub-1MB memory may be within the range[0, SZ_1M]
+* or ac

[PATCH] x86/efi: Do not release sub-1MB memory regions when the crashkernel option is specified

2021-04-07 Thread Lianbo Jiang

Some sub-1MB memory regions may be reserved by EFI boot services, and the
memory regions will be released later in the efi_free_boot_services().

Currently, always reserve all sub-1MB memory regions when the crashkernel
option is specified, but unfortunately EFI boot services may have already
reserved some sub-1MB memory regions before the crash_reserve_low_1M() is
called, which makes that the crash_reserve_low_1M() only own the
remaining sub-1MB memory regions, not all sub-1MB memory regions, because,
subsequently EFI boot services will free its own sub-1MB memory regions.
Eventually, DMA will be able to allocate memory from the sub-1MB area and
cause the following error:

crash> kmem -s |grep invalid
kmem: dma-kmalloc-512: slab: d52c40001900 invalid freepointer: 
9403c0067300
kmem: dma-kmalloc-512: slab: d52c40001900 invalid freepointer: 
9403c0067300
crash> vtop 9403c0067300
VIRTUAL   PHYSICAL
9403c0067300  67300   --->The physical address falls into this range 
[0x00063000-0x0008efff]

kernel debugging log:
...
[0.008927] memblock_reserve: [0x0001-0x00013fff] 
efi_reserve_boot_services+0x85/0xd0
[0.008930] memblock_reserve: [0x00063000-0x0008efff] 
efi_reserve_boot_services+0x85/0xd0
...
[0.009425] memblock_reserve: [0x-0x000f] 
crash_reserve_low_1M+0x2c/0x49
...
[0.010586] Zone ranges:
[0.010587]   DMA  [mem 0x1000-0x00ff]
[0.010589]   DMA32[mem 0x0100-0x]
[0.010591]   Normal   [mem 0x0001-0x000c7fff]
[0.010593]   Device   empty
...
[8.814894] __memblock_free_late: [0x00063000-0x0008efff] 
efi_free_boot_services+0x14b/0x23b
[8.815793] __memblock_free_late: [0x0001-0x00013fff] 
efi_free_boot_services+0x14b/0x23b

Do not release sub-1MB memory regions even though they are reserved by
EFI boot services, so that always reserve all sub-1MB memory regions when
the crashkernel option is specified.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/platform/efi/quirks.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 67d93a243c35..637f932c4fd4 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define EFI_MIN_RESERVE 5120
 
@@ -303,6 +304,19 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
size)
  */
 static __init bool can_free_region(u64 start, u64 size)
 {
+   /*
+* Some sub-1MB memory regions may be reserved by EFI boot
+* services, and these memory regions will be released later
+* in the efi_free_boot_services().
+*
+* Do not release sub-1MB memory regions even though they are
+* reserved by EFI boot services, because, always reserve all
+* sub-1MB memory when the crashkernel option is specified.
+*/
+   if (cmdline_find_option(boot_command_line, "crashkernel", NULL, 0) > 0
+   && (start + size < (1<<20)))
+   return false;
+
if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end))
return false;
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH] docs: admin-guide: update kdump documentation due to change of crash URL

2020-09-18 Thread Lianbo Jiang

Since crash utility has moved to github, the original URL is no longer
available. Let's update it accordingly.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 Documentation/admin-guide/kdump/kdump.rst | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index 2da65fef2a1c..75a9dd98e76e 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -509,9 +509,12 @@ ELF32-format headers using the --elf32-core-headers kernel 
option on the
 dump kernel.
 
 You can also use the Crash utility to analyze dump files in Kdump
-format. Crash is available on Dave Anderson's site at the following URL:
+format. Crash is available at the following URL:
 
-   http://people.redhat.com/~anderson/
+   https://github.com/crash-utility/crash
+
+Crash document can be found at:
+   https://crash-utility.github.io/
 
 Trigger Kdump on WARN()
 ===
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/3] kexec_file: correctly output debugging information for the PT_LOAD elf header

2020-08-03 Thread Lianbo Jiang

Currently, when we enable the debugging switch to debug kexec_file,
always get the following wrong results:

kexec_file: Crash PT_LOAD elf header. phdr=c988639b vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=51 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=3cca69a0 vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=52 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=c584cb9f vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=53 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=cf85d57f vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=54 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=a4a8f847 vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=55 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=272ec49f vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=56 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=ea0b65de vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=57 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=1f5e490c vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=58 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=dfe4109e vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=59 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=480ed2b6 vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=60 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=80b65151 vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=61 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=24e31c5e vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=62 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=332e0385 vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=63 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=2754d5da vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=64 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=783320dd vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=65 p_offset=0x0
kexec_file: Crash PT_LOAD elf header. phdr=76fe5b64 vaddr=0x0, 
paddr=0x0, sz=0x0 e_phnum=66 p_offset=0x0

The reason is that kernel always prints the values of the next PT_LOAD
instead of the current PT_LOAD. Change it to ensure that we can get the
correct debugging information.

Signed-off-by: Lianbo Jiang 
---
 kernel/kexec_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 41616b6a80ad..e2c03b4ce31b 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -1323,10 +1323,10 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, 
int kernel_map,
phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
phdr->p_align = 0;
ehdr->e_phnum++;
-   phdr++;
pr_debug("Crash PT_LOAD elf header. phdr=%p vaddr=0x%llx, 
paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
ehdr->e_phnum, phdr->p_offset);
+   phdr++;
}
 
*addr = buf;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/3] x86/crash: Correct the address boundary of function parameters

2020-08-03 Thread Lianbo Jiang

Let's carefully handle the boundary of the function parameter to make
sure that the arguments passed doesn't exceed the address range.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index fd87b59452a3..a8f3af257e26 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -230,7 +230,7 @@ static int elf_header_exclude_ranges(struct crash_mem *cmem)
int ret = 0;
 
/* Exclude the low 1M because it is always reserved */
-   ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+   ret = crash_exclude_mem_range(cmem, 0, (1<<20)-1);
if (ret)
return ret;
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/3] kexec: Improve the crash_exclude_mem_range() to handle the overlapping ranges

2020-08-03 Thread Lianbo Jiang

The crash_exclude_mem_range() can only handle one memory region one time.
It will fail the case in which the passed in area covers several memory
regions. In the case, it will only exclude the first region, then return,
but leave the later regions unsolved.

E.g in a NEC system with two usable RAM regions inside the low 1M:
...
BIOS-e820: [mem 0x-0x0003efff] usable
BIOS-e820: [mem 0x0003f000-0x0003] reserved
BIOS-e820: [mem 0x0004-0x0009] usable

It will only exclude the memory region [0, 0x3efff], the memory region
[0x4, 0x9] will still be added into /proc/vmcore, which may cause
the following failure when dumping the vmcore:

ioremap on RAM at 0x0004 - 0x00040fff
WARNING: CPU: 0 PID: 665 at arch/x86/mm/ioremap.c:186 
__ioremap_caller+0x2c7/0x2e0
...
RIP: 0010:__ioremap_caller+0x2c7/0x2e0
Code: 05 20 47 1c 01 48 09 c5 e9 93 fe ff ff 48 8d 54 24 28 48 8d 74 24 18 48 c7
  c7 85 e7 09 82 c6 05 b4 10 36 01 01 e8 32 91 04 00 <0f> 0b 45 31 ff e9 f3
  fe ff ff e8 2a 8e 04 00 66 2e 0f 1f 84 00 00
RSP: 0018:c971fd60 EFLAGS: 00010286
RAX:  RBX: 0004 RCX: 
RDX: 8880620268c0 RSI: 888062016a08 RDI: 888062016a08
RBP:  R08: 0441 R09: 0048
R10:  R11: c971fc08 R12: 7f794c343000
R13: 1000 R14:  R15: 
FS:  7f794c352800() GS:88806200() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f794c35 CR3: 5df9c005 CR4: 001606b0
Call Trace:
? __copy_oldmem_page.part.0+0x9c/0xb0
__copy_oldmem_page.part.0+0x9c/0xb0
read_from_oldmem.part.2+0xe2/0x140
read_vmcore+0xd8/0x2f0
proc_reg_read+0x39/0x60
vfs_read+0x91/0x140
ksys_read+0x4f/0xb0
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
cp: error reading '/proc/vmcore': Cannot allocate memory
kdump: saving vmcore failed

In order to solve this issue, let's extend the crash_exclude_mem_range()
to handle the overlapping ranges.

Signed-off-by: Lianbo Jiang 
---
 kernel/kexec_file.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 09cc78df53c6..41616b6a80ad 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -1157,24 +1157,26 @@ int crash_exclude_mem_range(struct crash_mem *mem,
unsigned long long mstart, unsigned long long mend)
 {
int i, j;
-   unsigned long long start, end;
+   unsigned long long start, end, p_start, p_end;
struct crash_mem_range temp_range = {0, 0};
 
for (i = 0; i < mem->nr_ranges; i++) {
start = mem->ranges[i].start;
end = mem->ranges[i].end;
+   p_start = mstart;
+   p_end = mend;
 
if (mstart > end || mend < start)
continue;
 
/* Truncate any area outside of range */
if (mstart < start)
-   mstart = start;
+   p_start = start;
if (mend > end)
-   mend = end;
+   p_end = end;
 
/* Found completely overlapping range */
-   if (mstart == start && mend == end) {
+   if (p_start == start && p_end == end) {
mem->ranges[i].start = 0;
mem->ranges[i].end = 0;
if (i < mem->nr_ranges - 1) {
@@ -1185,20 +1187,29 @@ int crash_exclude_mem_range(struct crash_mem *mem,
mem->ranges[j].end =
mem->ranges[j+1].end;
}
+
+   /*
+* Continue to check if there are another 
overlapping ranges
+* from the current position because of 
shifting the above
+* mem ranges.
+*/
+   i--;
+   mem->nr_ranges--;
+   continue;
}
mem->nr_ranges--;
return 0;
}
 
-   if (mstart > start && mend < end) {
+   if (p_start > start && p_end < end) {
/* Split original range */
-   mem->ranges[i].end = mstart - 1;
-   temp_range.start = mend + 1;
+   mem->ranges[i].end = p_start - 1;
+   temp_range.start = p_end + 1;
temp_range.end = end;
-   } e

[PATCH 0/3] x86/kexec_file: Fix some corners bugs and improve the crash_exclude_mem_range()

2020-08-03 Thread Lianbo Jiang

This series includes the following patches, it fixes some corners bugs
and improves the crash_exclude_mem_range().

[1] [PATCH 1/3] x86/crash: Correct the address boundary of function
parameters
[2] [PATCH 2/3] kexec: Improve the crash_exclude_mem_range() to handle
the overlapping ranges
[3] [PATCH 3/3] kexec_file: correctly output debugging information for
the PT_LOAD elf header

Lianbo Jiang (3):
  x86/crash: Correct the address boundary of function parameters
  kexec: Improve the crash_exclude_mem_range() to handle the overlapping
ranges
  kexec_file: correctly output debugging information for the PT_LOAD elf
header

 arch/x86/kernel/crash.c |  2 +-
 kernel/kexec_file.c | 33 ++---
 2 files changed, 23 insertions(+), 12 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH v2] kexec: Do not verify the signature without the lockdown or mandatory signature

2020-06-01 Thread Lianbo Jiang

Signature verification is an important security feature, to protect
system from being attacked with a kernel of unknown origin. Kexec
rebooting is a way to replace the running kernel, hence need be
secured carefully.

In the current code of handling signature verification of kexec kernel,
the logic is very twisted. It mixes signature verification, IMA signature
appraising and kexec lockdown.

If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
signature, the supported crypto, and key, we don't think this is wrong,
Unless kexec lockdown is executed. IMA is considered as another kind of
signature appraising method.

If kexec kernel image has signature/crypto/key, it has to go through the
signature verification and pass. Otherwise it's seen as verification
failure, and won't be loaded.

Seems kexec kernel image with an unqualified signature is even worse than
those w/o signature at all, this sounds very unreasonable. E.g. If people
get a unsigned kernel to load, or a kernel signed with expired key, which
one is more dangerous?

So, here, let's simplify the logic to improve code readability. If the
KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification
is mandated. Otherwise, we lift the bar for any kernel image.

Signed-off-by: Lianbo Jiang 
---
Changes since v1:
[1] Modify the log level(suggested by Jiri Bohac)

 kernel/kexec_file.c | 34 ++
 1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index faa74d5f6941..fae496958a68 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -181,34 +181,19 @@ void kimage_file_post_load_cleanup(struct kimage *image)
 static int
 kimage_validate_signature(struct kimage *image)
 {
-   const char *reason;
int ret;
 
ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
   image->kernel_buf_len);
-   switch (ret) {
-   case 0:
-   break;
+   if (ret) {
 
-   /* Certain verification errors are non-fatal if we're not
-* checking errors, provided we aren't mandating that there
-* must be a valid signature.
-*/
-   case -ENODATA:
-   reason = "kexec of unsigned image";
-   goto decide;
-   case -ENOPKG:
-   reason = "kexec of image with unsupported crypto";
-   goto decide;
-   case -ENOKEY:
-   reason = "kexec of image with unavailable key";
-   decide:
if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
-   pr_notice("%s rejected\n", reason);
+   pr_notice("Enforced kernel signature verification 
failed (%d).\n", ret);
return ret;
}
 
-   /* If IMA is guaranteed to appraise a signature on the kexec
+   /*
+* If IMA is guaranteed to appraise a signature on the kexec
 * image, permit it even if the kernel is otherwise locked
 * down.
 */
@@ -216,17 +201,10 @@ kimage_validate_signature(struct kimage *image)
security_locked_down(LOCKDOWN_KEXEC))
return -EPERM;
 
-   return 0;
-
-   /* All other errors are fatal, including nomem, unparseable
-* signatures and signature check failures - even if signatures
-* aren't required.
-*/
-   default:
-   pr_notice("kernel signature verification failed (%d).\n", ret);
+   pr_debug("kernel signature verification failed (%d).\n", ret);
}
 
-   return ret;
+   return 0;
 }
 #endif
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH] kexec: Do not verify the signature without the lockdown or mandatory signature

2020-05-24 Thread Lianbo Jiang

Signature verification is an important security feature, to protect
system from being attacked with a kernel of unknown origin. Kexec
rebooting is a way to replace the running kernel, hence need be
secured carefully.

In the current code of handling signature verification of kexec kernel,
the logic is very twisted. It mixes signature verification, IMA signature
appraising and kexec lockdown.

If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
signature, the supported crypto, and key, we don't think this is wrong,
Unless kexec lockdown is executed. IMA is considered as another kind of
signature appraising method.

If kexec kernel image has signature/crypto/key, it has to go through the
signature verification and pass. Otherwise it's seen as verification
failure, and won't be loaded.

Seems kexec kernel image with an unqualified signature is even worse than
those w/o signature at all, this sounds very unreasonable. E.g. If people
get a unsigned kernel to load, or a kernel signed with expired key, which
one is more dangerous?

So, here, let's simplify the logic to improve code readability. If the
KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification
is mandated. Otherwise, we lift the bar for any kernel image.

Signed-off-by: Lianbo Jiang 
---
 kernel/kexec_file.c | 37 ++---
 1 file changed, 6 insertions(+), 31 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index faa74d5f6941..e4bdf0c42f35 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -181,52 +181,27 @@ void kimage_file_post_load_cleanup(struct kimage *image)
 static int
 kimage_validate_signature(struct kimage *image)
 {
-   const char *reason;
int ret;
 
ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
   image->kernel_buf_len);
-   switch (ret) {
-   case 0:
-   break;
+   if (ret) {
+   pr_debug("kernel signature verification failed (%d).\n", ret);
 
-   /* Certain verification errors are non-fatal if we're not
-* checking errors, provided we aren't mandating that there
-* must be a valid signature.
-*/
-   case -ENODATA:
-   reason = "kexec of unsigned image";
-   goto decide;
-   case -ENOPKG:
-   reason = "kexec of image with unsupported crypto";
-   goto decide;
-   case -ENOKEY:
-   reason = "kexec of image with unavailable key";
-   decide:
-   if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
-   pr_notice("%s rejected\n", reason);
+   if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE))
return ret;
-   }
 
-   /* If IMA is guaranteed to appraise a signature on the kexec
+   /*
+* If IMA is guaranteed to appraise a signature on the kexec
 * image, permit it even if the kernel is otherwise locked
 * down.
 */
if (!ima_appraise_signature(READING_KEXEC_IMAGE) &&
security_locked_down(LOCKDOWN_KEXEC))
return -EPERM;
-
-   return 0;
-
-   /* All other errors are fatal, including nomem, unparseable
-* signatures and signature check failures - even if signatures
-* aren't required.
-*/
-   default:
-   pr_notice("kernel signature verification failed (%d).\n", ret);
}
 
-   return ret;
+   return 0;
 }
 #endif
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH v2] kexec: support parsing the string "Reserved" to get the correct e820 reserved region

2020-02-23 Thread Lianbo Jiang

When loading kernel and initramfs for kexec, kexec-tools could get the
e820 reserved region from "/proc/iomem" in order to rebuild the e820
ranges for kexec kernel, but there may be the string "Reserved" in the
"/proc/iomem", which caused the failure of parsing. For example:

 #cat /proc/iomem|grep -i reserved
-0fff : Reserved
7f338000-7f34dfff : Reserved
7f3cd000-8fff : Reserved
f17f-f17f1fff : Reserved
fe00- : Reserved

Currently, kexec-tools can not handle the above case because the memcmp()
is case sensitive when comparing the string.

So, let's fix this corner and make sure that the string "reserved" and
"Reserved" in the "/proc/iomem" are both parsed appropriately.

Signed-off-by: Lianbo Jiang 
---
Note:
Please follow up this commit below about kdump fix.
1ac3e4a57000 ("kdump: fix an error that can not parse the e820 reserved region")

Changes since v1:
[1] use strncasecmp() instead of introducing another 'else-if'(
suggested by Bhupesh)

 kexec/arch/i386/kexec-x86-common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kexec/arch/i386/kexec-x86-common.c 
b/kexec/arch/i386/kexec-x86-common.c
index 61ea19380ab2..9303704a0714 100644
--- a/kexec/arch/i386/kexec-x86-common.c
+++ b/kexec/arch/i386/kexec-x86-common.c
@@ -90,7 +90,7 @@ static int get_memory_ranges_proc_iomem(struct memory_range 
**range, int *ranges
if (memcmp(str, "System RAM\n", 11) == 0) {
type = RANGE_RAM;
}
-   else if (memcmp(str, "reserved\n", 9) == 0) {
+   else if (strncasecmp(str, "reserved\n", 9) == 0) {
type = RANGE_RESERVED;
}
else if (memcmp(str, "ACPI Tables\n", 12) == 0) {
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH] kexec: support parsing the string "Reserved" to get the correct e820 reserved region

2020-02-12 Thread Lianbo Jiang

When loading kernel and initramfs for kexec, kexec-tools could get the
e820 reserved region from "/proc/iomem" in order to rebuild the e820
ranges for kexec kernel, but there may be the string "Reserved" in the
"/proc/iomem", which caused the failure of parsing. For example:

 #cat /proc/iomem|grep -i reserved
-0fff : Reserved
7f338000-7f34dfff : Reserved
7f3cd000-8fff : Reserved
f17f-f17f1fff : Reserved
fe00- : Reserved

Currently, kexec-tools can not handle the above case because the memcmp()
is case sensitive when comparing the string.

So, let's fix this corner and make sure that the string "reserved" and
"Reserved" in the "/proc/iomem" are both parsed appropriately.

Signed-off-by: Lianbo Jiang 
---
Note:
Please follow up this commit below about kdump fix.
1ac3e4a57000 ("kdump: fix an error that can not parse the e820 reserved region")

 kexec/arch/i386/kexec-x86-common.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kexec/arch/i386/kexec-x86-common.c 
b/kexec/arch/i386/kexec-x86-common.c
index 61ea19380ab2..86bcc8c0677e 100644
--- a/kexec/arch/i386/kexec-x86-common.c
+++ b/kexec/arch/i386/kexec-x86-common.c
@@ -93,6 +93,9 @@ static int get_memory_ranges_proc_iomem(struct memory_range 
**range, int *ranges
else if (memcmp(str, "reserved\n", 9) == 0) {
type = RANGE_RESERVED;
}
+   else if (memcmp(str, "Reserved\n", 9) == 0) {
+   type = RANGE_RESERVED;
+   }
else if (memcmp(str, "ACPI Tables\n", 12) == 0) {
type = RANGE_ACPI;
}
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/3 v9] kexec: Fix i386 build warnings that missed declaration of struct kimage

2019-11-08 Thread Lianbo Jiang

Kbuild test robot reported some build warnings as follow:

arch/x86/include/asm/crash.h:5:32: warning: 'struct kimage' declared
inside parameter list will not be visible outside of this definition
or declaration
int crash_load_segments(struct kimage *image);
   ^~
int crash_copy_backup_region(struct kimage *image);
^~
int crash_setup_memmap_entries(struct kimage *image,
  ^~
The 'struct kimage' is defined in the header file include/linux/kexec.h,
before using it, need to include its header file or make a declaration.
Otherwise the above warnings may be triggered.

Add a declaration of struct kimage to the file arch/x86/include/asm/
crash.h, that will solve these compile warnings.

Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Reported-by: kbuild test robot 
Signed-off-by: Lianbo Jiang 
Link: https://lkml.kernel.org/r/201910310233.ejrttmwp%25...@intel.com
---
 arch/x86/include/asm/crash.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index 3dff55f4ed9d..88eadd08ad70 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_CRASH_H
 #define _ASM_X86_CRASH_H
 
+struct kimage;
+
 int crash_load_segments(struct kimage *image);
 int crash_copy_backup_region(struct kimage *image);
 int crash_setup_memmap_entries(struct kimage *image,
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/3 v9] x86/kdump: clean up all the code related to the backup region

2019-11-08 Thread Lianbo Jiang

When the crashkernel kernel command line option is specified, the
low 1M memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1M area, thereby, it's not
necessary to create a backup region and also no need to copy the
first 640k content to a backup region.

Currently, the code related to the backup region can be safely
removed, so lets clean up.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 5 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   /* Details of backup region */
-   unsigned long backup_src_start;
-   unsigned long backup_src_sz;
-
-   /* Physical address of backup segment */
-   unsigned long backup_load_addr;
 
/* Core ELF header buffer */
void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include 
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index db2301afade5..40b04b6eb675 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -188,8 +188,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -232,6 +230,11 @@ static int elf_header_exclude_ranges(struct crash_mem 
*cmem)
 {
int ret = 0;
 
+   /* Exclude the low 1M because it is always reserved */
+   ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+   if (ret)
+   return ret;
+
/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (ret)
@@ -261,9 +264,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
unsigned long *sz)
 {
struct crash_mem *cmem;
-   Elf64_Ehdr *ehdr;
-   Elf64_Phdr *phdr;
-   int ret, i;
+   int ret;
 
cmem = fill_up_crash_elf_data();
if (!cmem)
@@ -282,22 +283,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem,
IS_ENABLED(CONFIG_X86_64), addr, sz);
-   if (ret)
-   goto out;
 
-   /*
-* If a range matches backup region, adjust offset to backup
-* segment.
-*/
-   ehdr = (Elf64_Ehdr *)*addr;
-   phdr = (Elf64_Phdr *)(ehdr + 1);
-   for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-   if (phdr->p_type == PT_LOAD &&
-   phdr->p_paddr == image->arch.backup_src_start &&
-   phdr->p_memsz == image->arch.backup_src_sz) {
-   phdr->p_offset = image->arch.backup_load_addr;
-   break;
-   }
 out:
vfree(cmem);
return ret;
@@ -336,19 +322,11 @@ static int memmap_exclude_ranges(struct kimage *image, 
struct crash_mem *cmem,
 unsigned long long mend)
 {
unsigned long start, end;
-   int ret = 0;
 
cmem->ranges[0].start = mstart;
cmem->ranges[0].end = mend;
cmem->nr_ranges = 1;
 
-   /* Exclude Backup region */
-   start = image->arch.backup_load_addr;
-   end = start + image->arch.backup_src_sz - 1;
-   ret = crash_exclude_mem_

[PATCH 1/3 v9] x86/kdump: always reserve the low 1M when the crashkernel option is specified

2019-11-08 Thread Lianbo Jiang

Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

But, the current handling of copying the first 640k area runs into
problems when SME is enabled, kernel does not properly copy these
old memory to the backup area in the purgatory(), thereby, kdump
kernel reads out the encrypted contents, because the kdump kernel
must access the first kernel's memory with the encryption bit set
when SME is enabled in the first kernel. Please refer to this link:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Finally, it causes the following errors, and the crash tool gets
invalid pointers when parsing the vmcore.

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

To avoid the above errors, when the crashkernel option is specified,
lets reserve the remaining low 1M memory(after reserving real mode
memory) so that the allocated memory does not fall into the low 1M
area, which makes us not to copy the first 640k content to a backup
region in purgatory(). This indicates that it does not need to be
included in crash dumps or used for anything except the processor
trampolines that must live in the low 1M.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/crash.h |  6 ++
 arch/x86/kernel/crash.c  | 15 +++
 arch/x86/realmode/init.c |  2 ++
 3 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index 0acf5ee45a21..3dff55f4ed9d 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -8,4 +8,10 @@ int crash_setup_memmap_entries(struct kimage *image,
struct boot_params *params);
 void crash_smp_send_stop(void);
 
+#ifdef CONFIG_KEXEC_CORE
+void __init crash_reserve_low_1M(void);
+#else
+static inline void __init crash_reserve_low_1M(void) { }
+#endif
+
 #endif /* _ASM_X86_CRASH_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..db2301afade5 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -39,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
@@ -68,6 +70,19 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
rcu_read_unlock();
 }
 
+/*
+ * When the crashkernel option is specified, only use the low
+ * 1M for the real mode trampoline.
+ */
+void __init crash_reserve_low_1M(void)
+{
+   if (cmdline_find_option(boot_command_line, "crashkernel",
+   NULL, 0) > 0) {
+   memblock_reserve(0, 1<<20);
+   pr_info("Reserving the low 1M of memory for crashkernel\n");
+   }
+}
+
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
 static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..262f83cad355 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
@@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
 
memblock_reserve(mem, size);
set_real_mode_mem(mem);
+   crash_reserve_low_1M();
 }
 
 static void __init setup_real_mode(void)
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/3 v9] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-11-08 Thread Lianbo Jiang

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active(https://bugzilla.kernel.org/show_bug.cgi?id=204793).

The low 1M region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1M region, because the memory allocated later
won't fall into the low 1M area.

This series includes three patches:
[1] x86/kdump: always reserve the low 1M when the crashkernel option
is specified
The low 1M region will always be reserved when the crashkernel
kernel command line option is specified, which ensures that the
memory allocated later won't fall into the low 1M area.

[2] x86/kdump: clean up all the code related to the backup region
Remove the backup region and clean up.

[3] kexec: Fix i386 build warnings that missed declaration of struct
kimage

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1M region when the crashkernel option is only
specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Changes since v3:
[1] Improve the first patch's log

[2] Improve the third patch based on Eric's suggestions

Changes since v4:
[1] Correct some typos, and also improve the first patch's log

[2] Add a new function kexec_reserve_low_1MiB() in kernel/kexec_core.c
and which is called by reserve_real_mode(). (Suggested by Boris)

Changes since v5:
[1] Call the cmdline_find_option() instead of strstr() to check the
crashkernel option. (Suggested by Hatayama)

[2] Add a weak function kexec_reserve_low_1MiB() in kernel/kexec_core.c,
and implement the kexec_reserve_low_1MiB() in arch/x86/kernel/
machine_kexec_64.c so that it does not cause the compile error
on non-x86 kernel, and also ensures that it can work well on x86
kernel.

Changes since v6:
[1] Move the kexec_reserve_low_1MiB() to arch/x86/kernel/crash.c and
also move its declaration function to arch/x86/include/asm/crash.h
(Suggested by Dave Young)

[2] Adjust the corresponding header files.

Changes since v7:
[1] Change the function name from kexec_reserve_low_1MiB() to
crash_reserve_low_1M().

[2] Fix some warnings reported by kduild.

Changes since v8:
[1] Fix some build warnings reported by kbuild and make it as a separate
patch.

Lianbo Jiang (3):
  x86/kdump: always reserve the low 1M when the crashkernel option is
specified
  x86/kdump: clean up all the code related to the backup region
  kexec: Fix i386 build warnings that missed declaration of struct
kimage

 arch/x86/include/asm/crash.h   |   8 +++
 arch/x86/include/asm/kexec.h   |  10 ---
 arch/x86/include/asm/purgatory.h   |  10 ---
 arch/x86/kernel/crash.c| 102 -
 arch/x86/kernel/machine_kexec_64.c |  47 -
 arch/x86/purgatory/purgatory.c |  19 --
 arch/x86/realmode/init.c   |   2 +
 7 files changed, 36 insertions(+), 162 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 RESEND v8] x86/kdump: clean up all the code related to the backup region

2019-10-30 Thread Lianbo Jiang

When the crashkernel kernel command line option is specified, the
low 1M memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1M area, thereby, it's not
necessary to create a backup region and also no need to copy the
first 640k content to a backup region.

Currently, the code related to the backup region can be safely
removed, so lets clean up.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 5 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   /* Details of backup region */
-   unsigned long backup_src_start;
-   unsigned long backup_src_sz;
-
-   /* Physical address of backup segment */
-   unsigned long backup_load_addr;
 
/* Core ELF header buffer */
void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include 
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index db2301afade5..40b04b6eb675 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -188,8 +188,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -232,6 +230,11 @@ static int elf_header_exclude_ranges(struct crash_mem 
*cmem)
 {
int ret = 0;
 
+   /* Exclude the low 1M because it is always reserved */
+   ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+   if (ret)
+   return ret;
+
/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (ret)
@@ -261,9 +264,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
unsigned long *sz)
 {
struct crash_mem *cmem;
-   Elf64_Ehdr *ehdr;
-   Elf64_Phdr *phdr;
-   int ret, i;
+   int ret;
 
cmem = fill_up_crash_elf_data();
if (!cmem)
@@ -282,22 +283,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem,
IS_ENABLED(CONFIG_X86_64), addr, sz);
-   if (ret)
-   goto out;
 
-   /*
-* If a range matches backup region, adjust offset to backup
-* segment.
-*/
-   ehdr = (Elf64_Ehdr *)*addr;
-   phdr = (Elf64_Phdr *)(ehdr + 1);
-   for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-   if (phdr->p_type == PT_LOAD &&
-   phdr->p_paddr == image->arch.backup_src_start &&
-   phdr->p_memsz == image->arch.backup_src_sz) {
-   phdr->p_offset = image->arch.backup_load_addr;
-   break;
-   }
 out:
vfree(cmem);
return ret;
@@ -336,19 +322,11 @@ static int memmap_exclude_ranges(struct kimage *image, 
struct crash_mem *cmem,
 unsigned long long mend)
 {
unsigned long start, end;
-   int ret = 0;
 
cmem->ranges[0].start = mstart;
cmem->ranges[0].end = mend;
cmem->nr_ranges = 1;
 
-   /* Exclude Backup region */
-   start = image->arch.backup_load_addr;
-   end = start + image->arch.backup_src_sz - 1;
-   ret = crash_exclude_mem_

[PATCH 1/2 RESEND v8] x86/kdump: always reserve the low 1M when the crashkernel option is specified

2019-10-30 Thread Lianbo Jiang

Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

But, the current handling of copying the first 640k area runs into
problems when SME is enabled, kernel does not properly copy these
old memory to the backup area in the purgatory(), thereby, kdump
kernel reads out the encrypted contents, because the kdump kernel
must access the first kernel's memory with the encryption bit set
when SME is enabled in the first kernel. Please refer to this link:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Finally, it causes the following errors, and the crash tool gets
invalid pointers when parsing the vmcore.

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

To avoid the above errors, when the crashkernel option is specified,
lets reserve the remaining low 1M memory(after reserving real mode
memory) so that the allocated memory does not fall into the low 1M
area, which makes us not to copy the first 640k content to a backup
region in purgatory(). This indicates that it does not need to be
included in crash dumps or used for anything except the processor
trampolines that must live in the low 1M.

Signed-off-by: Lianbo Jiang 
Reported-by: kbuild test robot 
---
 arch/x86/include/asm/crash.h |  8 
 arch/x86/kernel/crash.c  | 15 +++
 arch/x86/realmode/init.c |  2 ++
 3 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index 0acf5ee45a21..88eadd08ad70 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -2,10 +2,18 @@
 #ifndef _ASM_X86_CRASH_H
 #define _ASM_X86_CRASH_H
 
+struct kimage;
+
 int crash_load_segments(struct kimage *image);
 int crash_copy_backup_region(struct kimage *image);
 int crash_setup_memmap_entries(struct kimage *image,
struct boot_params *params);
 void crash_smp_send_stop(void);
 
+#ifdef CONFIG_KEXEC_CORE
+void __init crash_reserve_low_1M(void);
+#else
+static inline void __init crash_reserve_low_1M(void) { }
+#endif
+
 #endif /* _ASM_X86_CRASH_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..db2301afade5 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -39,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
@@ -68,6 +70,19 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
rcu_read_unlock();
 }
 
+/*
+ * When the crashkernel option is specified, only use the low
+ * 1M for the real mode trampoline.
+ */
+void __init crash_reserve_low_1M(void)
+{
+   if (cmdline_find_option(boot_command_line, "crashkernel",
+   NULL, 0) > 0) {
+   memblock_reserve(0, 1<<20);
+   pr_info("Reserving the low 1M of memory for crashkernel\n");
+   }
+}
+
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
 static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..262f83cad355 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
@@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
 
memblock_reserve(mem, size);
set_real_mode_mem(mem);
+   crash_reserve_low_1M();
 }
 
 static void __init setup_real_mode(void)
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 RESEND v8] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-30 Thread Lianbo Jiang

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active(https://bugzilla.kernel.org/show_bug.cgi?id=204793).

The low 1M region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1M region, because the memory allocated later
won't fall into the low 1M area.

This series includes two patches:
[1] x86/kdump: always reserve the low 1M when the crashkernel option
is specified
The low 1M region will always be reserved when the crashkernel
kernel command line option is specified, which ensures that the
memory allocated later won't fall into the low 1M area.

[2] x86/kdump: clean up all the code related to the backup region
Remove the backup region and clean up.

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1M region when the crashkernel option is only
specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Changes since v3:
[1] Improve the first patch's log

[2] Improve the third patch based on Eric's suggestions

Changes since v4:
[1] Correct some typos, and also improve the first patch's log

[2] Add a new function kexec_reserve_low_1MiB() in kernel/kexec_core.c
and which is called by reserve_real_mode(). (Suggested by Boris)

Changes since v5:
[1] Call the cmdline_find_option() instead of strstr() to check the
crashkernel option. (Suggested by Hatayama)

[2] Add a weak function kexec_reserve_low_1MiB() in kernel/kexec_core.c,
and implement the kexec_reserve_low_1MiB() in arch/x86/kernel/
machine_kexec_64.c so that it does not cause the compile error
on non-x86 kernel, and also ensures that it can work well on x86
kernel.

Changes since v6:
[1] Move the kexec_reserve_low_1MiB() to arch/x86/kernel/crash.c and
also move its declaration function to arch/x86/include/asm/crash.h
(Suggested by Dave Young)

[2] Adjust the corresponding header files.

Changes since v7:
[1] Change the function name from kexec_reserve_low_1MiB() to
crash_reserve_low_1M().

[2] Fix some warnings reported by kduild.

Lianbo Jiang (2):
  x86/kdump: always reserve the low 1M when the crashkernel option is
specified
  x86/kdump: clean up all the code related to the backup region

 arch/x86/include/asm/crash.h   |   8 +++
 arch/x86/include/asm/kexec.h   |  10 ---
 arch/x86/include/asm/purgatory.h   |  10 ---
 arch/x86/kernel/crash.c| 102 -
 arch/x86/kernel/machine_kexec_64.c |  47 -
 arch/x86/purgatory/purgatory.c |  19 --
 arch/x86/realmode/init.c   |   2 +
 7 files changed, 36 insertions(+), 162 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 v8] x86/kdump: clean up all the code related to the backup region

2019-10-29 Thread Lianbo Jiang

When the crashkernel kernel command line option is specified, the
low 1M memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1M area, thereby, it's not
necessary to create a backup region and also no need to copy the
first 640k content to a backup region.

Currently, the code related to the backup region can be safely
removed, so lets clean up.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 5 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   /* Details of backup region */
-   unsigned long backup_src_start;
-   unsigned long backup_src_sz;
-
-   /* Physical address of backup segment */
-   unsigned long backup_load_addr;
 
/* Core ELF header buffer */
void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include 
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index db2301afade5..40b04b6eb675 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -188,8 +188,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -232,6 +230,11 @@ static int elf_header_exclude_ranges(struct crash_mem 
*cmem)
 {
int ret = 0;
 
+   /* Exclude the low 1M because it is always reserved */
+   ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+   if (ret)
+   return ret;
+
/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (ret)
@@ -261,9 +264,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
unsigned long *sz)
 {
struct crash_mem *cmem;
-   Elf64_Ehdr *ehdr;
-   Elf64_Phdr *phdr;
-   int ret, i;
+   int ret;
 
cmem = fill_up_crash_elf_data();
if (!cmem)
@@ -282,22 +283,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem,
IS_ENABLED(CONFIG_X86_64), addr, sz);
-   if (ret)
-   goto out;
 
-   /*
-* If a range matches backup region, adjust offset to backup
-* segment.
-*/
-   ehdr = (Elf64_Ehdr *)*addr;
-   phdr = (Elf64_Phdr *)(ehdr + 1);
-   for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-   if (phdr->p_type == PT_LOAD &&
-   phdr->p_paddr == image->arch.backup_src_start &&
-   phdr->p_memsz == image->arch.backup_src_sz) {
-   phdr->p_offset = image->arch.backup_load_addr;
-   break;
-   }
 out:
vfree(cmem);
return ret;
@@ -336,19 +322,11 @@ static int memmap_exclude_ranges(struct kimage *image, 
struct crash_mem *cmem,
 unsigned long long mend)
 {
unsigned long start, end;
-   int ret = 0;
 
cmem->ranges[0].start = mstart;
cmem->ranges[0].end = mend;
cmem->nr_ranges = 1;
 
-   /* Exclude Backup region */
-   start = image->arch.backup_load_addr;
-   end = start + image->arch.backup_src_sz - 1;
-   ret = crash_exclude_mem_

[PATCH 0/2 v8] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-29 Thread Lianbo Jiang

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active(https://bugzilla.kernel.org/show_bug.cgi?id=204793).

The low 1M region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1M region, because the memory allocated later
won't fall into the low 1M area.

This series includes two patches:
[1] x86/kdump: always reserve the low 1M when the crashkernel option
is specified
The low 1M region will always be reserved when the crashkernel
kernel command line option is specified, which ensures that the
memory allocated later won't fall into the low 1M area.

[2] x86/kdump: clean up all the code related to the backup region
Remove the backup region and clean up.

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1M region when the crashkernel option is only
specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Changes since v3:
[1] Improve the first patch's log

[2] Improve the third patch based on Eric's suggestions

Changes since v4:
[1] Correct some typos, and also improve the first patch's log

[2] Add a new function kexec_reserve_low_1MiB() in kernel/kexec_core.c
and which is called by reserve_real_mode(). (Suggested by Boris)

Changes since v5:
[1] Call the cmdline_find_option() instead of strstr() to check the
crashkernel option. (Suggested by Hatayama)

[2] Add a weak function kexec_reserve_low_1MiB() in kernel/kexec_core.c,
and implement the kexec_reserve_low_1MiB() in arch/x86/kernel/
machine_kexec_64.c so that it does not cause the compile error
on non-x86 kernel, and also ensures that it can work well on x86
kernel.

Changes since v6:
[1] Move the kexec_reserve_low_1MiB() to arch/x86/kernel/crash.c and
also move its declaration function to arch/x86/include/asm/crash.h
(Suggested by Dave Young)

[2] Adjust the corresponding header files.

Changes since v7:
[1] Change the function name from kexec_reserve_low_1MiB() to
crash_reserve_low_1M().

Lianbo Jiang (2):
  x86/kdump: always reserve the low 1M when the crashkernel option is
specified
  x86/kdump: clean up all the code related to the backup region

 arch/x86/include/asm/crash.h   |   6 ++
 arch/x86/include/asm/kexec.h   |  10 ---
 arch/x86/include/asm/purgatory.h   |  10 ---
 arch/x86/kernel/crash.c| 102 -
 arch/x86/kernel/machine_kexec_64.c |  47 -
 arch/x86/purgatory/purgatory.c |  19 --
 arch/x86/realmode/init.c   |   2 +
 7 files changed, 34 insertions(+), 162 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v8] x86/kdump: always reserve the low 1M when the crashkernel option is specified

2019-10-29 Thread Lianbo Jiang

Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

But, the current handling of copying the first 640k area runs into
problems when SME is enabled, kernel does not properly copy these
old memory to the backup area in the purgatory(), thereby, kdump
kernel reads out the encrypted contents, because the kdump kernel
must access the first kernel's memory with the encryption bit set
when SME is enabled in the first kernel. Please refer to this link:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Finally, it causes the following errors, and the crash tool gets
invalid pointers when parsing the vmcore.

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

To avoid the above errors, when the crashkernel option is specified,
lets reserve the remaining low 1M memory(after reserving real mode
memory) so that the allocated memory does not fall into the low 1M
area, which makes us not to copy the first 640k content to a backup
region in purgatory(). This indicates that it does not need to be
included in crash dumps or used for anything except the processor
trampolines that must live in the low 1M.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/crash.h |  6 ++
 arch/x86/kernel/crash.c  | 15 +++
 arch/x86/realmode/init.c |  2 ++
 3 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index 0acf5ee45a21..3dff55f4ed9d 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -8,4 +8,10 @@ int crash_setup_memmap_entries(struct kimage *image,
struct boot_params *params);
 void crash_smp_send_stop(void);
 
+#ifdef CONFIG_KEXEC_CORE
+void __init crash_reserve_low_1M(void);
+#else
+static inline void __init crash_reserve_low_1M(void) { }
+#endif
+
 #endif /* _ASM_X86_CRASH_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..db2301afade5 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -39,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
@@ -68,6 +70,19 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
rcu_read_unlock();
 }
 
+/*
+ * When the crashkernel option is specified, only use the low
+ * 1M for the real mode trampoline.
+ */
+void __init crash_reserve_low_1M(void)
+{
+   if (cmdline_find_option(boot_command_line, "crashkernel",
+   NULL, 0) > 0) {
+   memblock_reserve(0, 1<<20);
+   pr_info("Reserving the low 1M of memory for crashkernel\n");
+   }
+}
+
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
 static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..262f83cad355 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
@@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
 
memblock_reserve(mem, size);
set_real_mode_mem(mem);
+   crash_reserve_low_1M();
 }
 
 static void __init setup_real_mode(void)
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 v7] x86/kdump: clean up all the code related to the backup region

2019-10-28 Thread Lianbo Jiang

When the crashkernel kernel command line option is specified, the
low 1MiB memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1MiB area, thereby, it's not
necessary to create a backup region and also no need to copy the first
640k content to a backup region.

Currently, the code related to the backup region can be safely removed,
so lets clean up.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 5 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   /* Details of backup region */
-   unsigned long backup_src_start;
-   unsigned long backup_src_sz;
-
-   /* Physical address of backup segment */
-   unsigned long backup_load_addr;
 
/* Core ELF header buffer */
void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include 
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 144f519aef29..baf32a3c6f8c 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -188,8 +188,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -232,6 +230,11 @@ static int elf_header_exclude_ranges(struct crash_mem 
*cmem)
 {
int ret = 0;
 
+   /* Exclude the low 1MiB because it is always reserved */
+   ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+   if (ret)
+   return ret;
+
/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (ret)
@@ -261,9 +264,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
unsigned long *sz)
 {
struct crash_mem *cmem;
-   Elf64_Ehdr *ehdr;
-   Elf64_Phdr *phdr;
-   int ret, i;
+   int ret;
 
cmem = fill_up_crash_elf_data();
if (!cmem)
@@ -282,22 +283,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem,
IS_ENABLED(CONFIG_X86_64), addr, sz);
-   if (ret)
-   goto out;
 
-   /*
-* If a range matches backup region, adjust offset to backup
-* segment.
-*/
-   ehdr = (Elf64_Ehdr *)*addr;
-   phdr = (Elf64_Phdr *)(ehdr + 1);
-   for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-   if (phdr->p_type == PT_LOAD &&
-   phdr->p_paddr == image->arch.backup_src_start &&
-   phdr->p_memsz == image->arch.backup_src_sz) {
-   phdr->p_offset = image->arch.backup_load_addr;
-   break;
-   }
 out:
vfree(cmem);
return ret;
@@ -336,19 +322,11 @@ static int memmap_exclude_ranges(struct kimage *image, 
struct crash_mem *cmem,
 unsigned long long mend)
 {
unsigned long start, end;
-   int ret = 0;
 
cmem->ranges[0].start = mstart;
cmem->ranges[0].end = mend;
cmem->nr_ranges = 1;
 
-   /* Exclude Backup region */
-   start = image->arch.backup_load_addr;
-   end = start + image->arch.backup_src_sz - 1;
-

[PATCH 1/2 v7] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

2019-10-28 Thread Lianbo Jiang

Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

But, the current handling of copying the first 640k area runs into
problems when SME is enabled, kernel does not properly copy these
old memory to the backup area in the purgatory(), thereby, kdump
kernel reads out the encrypted contents, because the kdump kernel
must access the first kernel's memory with the encryption bit set
when SME is enabled in the first kernel. Please refer to this link:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Finally, it causes the following errors, and the crash tool gets
invalid pointers when parsing the vmcore.

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

To avoid the above errors, when the crashkernel option is specified,
lets reserve the remaining low 1MiB memory(after reserving real mode
memory) so that the allocated memory does not fall into the low 1MiB
area, which makes us not to copy the first 640k content to a backup
region in purgatory(). This indicates that it does not need to be
included in crash dumps or used for anything except the processor
trampolines that must live in the low 1MiB.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/crash.h |  6 ++
 arch/x86/kernel/crash.c  | 15 +++
 arch/x86/realmode/init.c |  2 ++
 3 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index 0acf5ee45a21..3e966a3dc823 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -8,4 +8,10 @@ int crash_setup_memmap_entries(struct kimage *image,
struct boot_params *params);
 void crash_smp_send_stop(void);
 
+#ifdef CONFIG_KEXEC_CORE
+void __init kexec_reserve_low_1MiB(void);
+#else
+static inline void __init kexec_reserve_low_1MiB(void) { }
+#endif
+
 #endif /* _ASM_X86_CRASH_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..144f519aef29 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -39,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
@@ -68,6 +70,19 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
rcu_read_unlock();
 }
 
+/*
+ * When the crashkernel option is specified, only use the low
+ * 1MiB for the real mode trampoline.
+ */
+void __init kexec_reserve_low_1MiB(void)
+{
+   if (cmdline_find_option(boot_command_line, "crashkernel",
+   NULL, 0) > 0) {
+   memblock_reserve(0, 1<<20);
+   pr_info("Reserving the low 1MiB of memory for crashkernel\n");
+   }
+}
+
 #if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
 
 static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..b8bbd0017ca8 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
@@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
 
memblock_reserve(mem, size);
set_real_mode_mem(mem);
+   kexec_reserve_low_1MiB();
 }
 
 static void __init setup_real_mode(void)
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 v7] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-28 Thread Lianbo Jiang

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active(https://bugzilla.kernel.org/show_bug.cgi?id=204793).

The low 1MiB region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1MiB region, because the memory allocated later
won't fall into the low 1MiB area.

This series includes two patches:
[1] x86/kdump: always reserve the low 1MiB when the crashkernel option
is specified
The low 1MiB region will always be reserved when the crashkernel
kernel command line option is specified, which ensures that the
memory allocated later won't fall into the low 1MiB area.

[2] x86/kdump: clean up all the code related to the backup region
Remove the backup region and clean up.

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1MiB region when the crashkernel option is only
specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Changes since v3:
[1] Improve the first patch's log

[2] Improve the third patch based on Eric's suggestions

Changes since v4:
[1] Correct some typos, and also improve the first patch's log

[2] Add a new function kexec_reserve_low_1MiB() in kernel/kexec_core.c
and which is called by reserve_real_mode(). (Suggested by Boris)

Changes since v5:
[1] Call the cmdline_find_option() instead of strstr() to check the
crashkernel option. (Suggested by Hatayama)

[2] Add a weak function kexec_reserve_low_1MiB() in kernel/kexec_core.c,
and implement the kexec_reserve_low_1MiB() in arch/x86/kernel/
machine_kexec_64.c so that it does not cause the compile error
on non-x86 kernel, and also ensures that it can work well on x86
kernel.

Changes since v6:
[1] Move the kexec_reserve_low_1MiB() to arch/x86/kernel/crash.c and
also move its declaration function to arch/x86/include/asm/crash.h
(Suggested by Dave Young)

[2] Adjust the corresponding header files.

Lianbo Jiang (2):
  x86/kdump: always reserve the low 1MiB when the crashkernel option is
specified
  x86/kdump: clean up all the code related to the backup region

 arch/x86/include/asm/crash.h   |   6 ++
 arch/x86/include/asm/kexec.h   |  10 ---
 arch/x86/include/asm/purgatory.h   |  10 ---
 arch/x86/kernel/crash.c| 102 -
 arch/x86/kernel/machine_kexec_64.c |  47 -
 arch/x86/purgatory/purgatory.c |  19 --
 arch/x86/realmode/init.c   |   2 +
 7 files changed, 34 insertions(+), 162 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 v6] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-27 Thread Lianbo Jiang

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active(https://bugzilla.kernel.org/show_bug.cgi?id=204793).

The low 1MiB region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1MiB region, because the memory allocated later
won't fall into the low 1MiB area.

This series includes two patches:
[1] x86/kdump: always reserve the low 1MiB when the crashkernel option
is specified
The low 1MiB region will always be reserved when the crashkernel
kernel command line option is specified, which ensures that the
memory allocated later won't fall into the low 1MiB area.

[2] x86/kdump: clean up all the code related to the backup region
Remove the backup region and clean up.

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1MiB region when the crashkernel option is only
specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Changes since v3:
[1] Improve the first patch's log

[2] Improve the third patch based on Eric's suggestions

Changes since v4:
[1] Correct some typos, and also improve the first patch's log

[2] Add a new function kexec_reserve_low_1MiB() in kernel/kexec_core.c
and which is called by reserve_real_mode(). (Suggested by Boris)

Changes since v5:
[1] Call the cmdline_find_option() instead of strstr() to check the
crashkernel option. (Suggested by Hatayama)

[2] Add a weak function kexec_reserve_low_1MiB() in kernel/kexec_core.c,
and implement the kexec_reserve_low_1MiB() in arch/x86/kernel/
machine_kexec_64.c so that it does not cause the compile error
on non-x86 kernel, and also ensures that it can work well on x86
kernel.

Lianbo Jiang (2):
  x86/kdump: always reserve the low 1MiB when the crashkernel option is
specified
  x86/kdump: clean up all the code related to the backup region

 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 62 ++---
 arch/x86/purgatory/purgatory.c | 19 ---
 arch/x86/realmode/init.c   |  2 +
 include/linux/kexec.h  |  2 +
 kernel/kexec_core.c|  3 ++
 8 files changed, 33 insertions(+), 162 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v6] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

2019-10-27 Thread Lianbo Jiang

Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

But, the current handling of copying the first 640k area runs into
problems when SME is enabled, kernel does not properly copy these
old memory to the backup area in the purgatory(), thereby, kdump
kernel reads out the encrypted contents, because the kdump kernel
must access the first kernel's memory with the encryption bit set
when SME is enabled in the first kernel. Please refer to this link:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Finally, it causes the following errors, and the crash tool gets
invalid pointers when parsing the vmcore.

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

To avoid the above errors, when the crashkernel option is specified,
lets reserve the remaining low 1MiB memory(after reserving real mode
memory) so that the allocated memory does not fall into the low 1MiB
area, which makes us not to copy the first 640k content to a backup
region in purgatory(). This indicates that it does not need to be
included in crash dumps or used for anything except the processor
trampolines that must live in the low 1MiB.

Signed-off-by: Lianbo Jiang 
---
BTW:I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for
SME situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter.

In addition, also need to clean all the code related to the backup
region later.

 arch/x86/kernel/machine_kexec_64.c | 15 +++
 arch/x86/realmode/init.c   |  2 ++
 include/linux/kexec.h  |  2 ++
 kernel/kexec_core.c|  3 +++
 4 files changed, 22 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 5dcd438ad8f2..42d7c15c45f1 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -27,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_ACPI
 /*
@@ -687,3 +689,16 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int 
pages)
 */
set_memory_encrypted((unsigned long)vaddr, pages);
 }
+
+/*
+ * When the crashkernel option is specified, only use the low
+ * 1MiB for the real mode trampoline.
+ */
+void __init kexec_reserve_low_1MiB(void)
+{
+   if (cmdline_find_option(boot_command_line, "crashkernel",
+   NULL, 0) > 0) {
+   memblock_reserve(0, 1<<20);
+   pr_info("Reserving the low 1MiB of memory for crashkernel\n");
+   }
+}
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..064cc79a015d 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
 
memblock_reserve(mem, size);
set_real_mode_mem(mem);
+   kexec_reserve_low_1MiB();
 }
 
 static void __init setup_real_mode(void)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1776eb2e43a4..988bf2de51a7 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -306,6 +306,7 @@ extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 int kexec_crash_loaded(void);
+void __init kexec_reserve_low_1MiB(void);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
 
@@ -397,6 +398,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }
 static inline int kexec_crash_loaded(void) { return 0; }
+static inline void __init kexec_reserve_low_1MiB(void) { }
 #define kexec_in_progress false
 #endif /* CONFIG_KEXEC_CORE */
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 15d70a90b50d..8856047bcdc8 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1213,3 +1213,6 @@ void __weak arch_kexec_protect_crashkres(void)
 
 void __weak arch_kexec_unprotect_crashkres(void)
 {}
+
+void _

[PATCH 2/2 v6] x86/kdump: clean up all the code related to the backup region

2019-10-27 Thread Lianbo Jiang

When the crashkernel kernel command line option is specified, the
low 1MiB memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1MiB area, thereby, it's not
necessary to create a backup region and also no need to copy the first
640k content to a backup region.

Currently, the code related to the backup region can be safely removed,
so lets clean up.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 5 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   /* Details of backup region */
-   unsigned long backup_src_start;
-   unsigned long backup_src_sz;
-
-   /* Physical address of backup segment */
-   unsigned long backup_load_addr;
 
/* Core ELF header buffer */
void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include 
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..ef54b3ffb0f6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -173,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -217,6 +215,11 @@ static int elf_header_exclude_ranges(struct crash_mem 
*cmem)
 {
int ret = 0;
 
+   /* Exclude the low 1MiB because it is always reserved */
+   ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+   if (ret)
+   return ret;
+
/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (ret)
@@ -246,9 +249,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
unsigned long *sz)
 {
struct crash_mem *cmem;
-   Elf64_Ehdr *ehdr;
-   Elf64_Phdr *phdr;
-   int ret, i;
+   int ret;
 
cmem = fill_up_crash_elf_data();
if (!cmem)
@@ -267,22 +268,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem,
IS_ENABLED(CONFIG_X86_64), addr, sz);
-   if (ret)
-   goto out;
 
-   /*
-* If a range matches backup region, adjust offset to backup
-* segment.
-*/
-   ehdr = (Elf64_Ehdr *)*addr;
-   phdr = (Elf64_Phdr *)(ehdr + 1);
-   for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-   if (phdr->p_type == PT_LOAD &&
-   phdr->p_paddr == image->arch.backup_src_start &&
-   phdr->p_memsz == image->arch.backup_src_sz) {
-   phdr->p_offset = image->arch.backup_load_addr;
-   break;
-   }
 out:
vfree(cmem);
return ret;
@@ -321,19 +307,11 @@ static int memmap_exclude_ranges(struct kimage *image, 
struct crash_mem *cmem,
 unsigned long long mend)
 {
unsigned long start, end;
-   int ret = 0;
 
cmem->ranges[0].start = mstart;
cmem->ranges[0].end = mend;
cmem->nr_ranges = 1;
 
-   /* Exclude Backup region */
-   start = image->arch.backup_load_addr;
-   end = start + image->arch.backup_src_sz - 1;
-

[PATCH 2/2 v5] x86/kdump: clean up all the code related to the backup region

2019-10-23 Thread Lianbo Jiang

When the crashkernel kernel command line option is specified, the
low 1MiB memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1MiB area, thereby, it's not
necessary to create a backup region and also no need to copy the first
640k content to a backup region.

Currently, the code related to the backup region can be safely removed,
so lets clean up.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 5 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   /* Details of backup region */
-   unsigned long backup_src_start;
-   unsigned long backup_src_sz;
-
-   /* Physical address of backup segment */
-   unsigned long backup_load_addr;
 
/* Core ELF header buffer */
void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include 
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..ef54b3ffb0f6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -173,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -217,6 +215,11 @@ static int elf_header_exclude_ranges(struct crash_mem 
*cmem)
 {
int ret = 0;
 
+   /* Exclude the low 1MiB because it is always reserved */
+   ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+   if (ret)
+   return ret;
+
/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (ret)
@@ -246,9 +249,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
unsigned long *sz)
 {
struct crash_mem *cmem;
-   Elf64_Ehdr *ehdr;
-   Elf64_Phdr *phdr;
-   int ret, i;
+   int ret;
 
cmem = fill_up_crash_elf_data();
if (!cmem)
@@ -267,22 +268,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem,
IS_ENABLED(CONFIG_X86_64), addr, sz);
-   if (ret)
-   goto out;
 
-   /*
-* If a range matches backup region, adjust offset to backup
-* segment.
-*/
-   ehdr = (Elf64_Ehdr *)*addr;
-   phdr = (Elf64_Phdr *)(ehdr + 1);
-   for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-   if (phdr->p_type == PT_LOAD &&
-   phdr->p_paddr == image->arch.backup_src_start &&
-   phdr->p_memsz == image->arch.backup_src_sz) {
-   phdr->p_offset = image->arch.backup_load_addr;
-   break;
-   }
 out:
vfree(cmem);
return ret;
@@ -321,19 +307,11 @@ static int memmap_exclude_ranges(struct kimage *image, 
struct crash_mem *cmem,
 unsigned long long mend)
 {
unsigned long start, end;
-   int ret = 0;
 
cmem->ranges[0].start = mstart;
cmem->ranges[0].end = mend;
cmem->nr_ranges = 1;
 
-   /* Exclude Backup region */
-   start = image->arch.backup_load_addr;
-   end = start + image->arch.backup_src_sz - 1;
-

[PATCH 1/2 v5] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

2019-10-23 Thread Lianbo Jiang

Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

But, the current handling of copying the first 640k area runs into
problems when SME is enabled, kernel does not properly copy these
old memory to the backup area in the purgatory(), thereby, kdump
kernel reads out the encrypted contents, because the kdump kernel
must access the first kernel's memory with the encryption bit set
when SME is enabled in the first kernel. Please refer to this link:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Finally, it causes the following errors, and the crash tool gets
invalid pointers when parsing the vmcore.

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

To avoid the above errors, when the crashkernel option is specified,
lets reserve the remaining low 1MiB memory(after reserving real mode
memory) so that the allocated memory does not fall into the low 1MiB
area, which makes us not to copy the first 640k content to a backup
region in purgatory(). This indicates that it does not need to be
included in crash dumps or used for anything except the processor
trampolines that must live in the low 1MiB.

Signed-off-by: Lianbo Jiang 
---
BTW:I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for
SME situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter.

In addition, also need to clean all the code related to the backup
region later.

 arch/x86/realmode/init.c |  2 ++
 include/linux/kexec.h|  2 ++
 kernel/kexec_core.c  | 13 +
 3 files changed, 17 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..064cc79a015d 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
 
memblock_reserve(mem, size);
set_real_mode_mem(mem);
+   kexec_reserve_low_1MiB();
 }
 
 static void __init setup_real_mode(void)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1776eb2e43a4..30acf1d738bc 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -306,6 +306,7 @@ extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 int kexec_crash_loaded(void);
+void __init kexec_reserve_low_1MiB(void);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
 
@@ -397,6 +398,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }
 static inline int kexec_crash_loaded(void) { return 0; }
+static inline void __init kexec_reserve_low_1MiB(void) { }
 #define kexec_in_progress false
 #endif /* CONFIG_KEXEC_CORE */
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 15d70a90b50d..5bd89f1fee42 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -70,6 +71,18 @@ struct resource crashk_low_res = {
.desc  = IORES_DESC_CRASH_KERNEL
 };
 
+/*
+ * When the crashkernel option is specified, only use the low
+ * 1MiB for the real mode trampoline.
+ */
+void __init kexec_reserve_low_1MiB(void)
+{
+   if (strstr(boot_command_line, "crashkernel=")) {
+   memblock_reserve(0, 1<<20);
+   pr_info("Reserving the low 1MiB of memory for crashkernel\n");
+   }
+}
+
 int kexec_should_crash(struct task_struct *p)
 {
/*
-- 
2.17.1

[PATCH 0/2 v5] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-23 Thread Lianbo Jiang

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active(https://bugzilla.kernel.org/show_bug.cgi?id=204793).

The low 1MiB region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1MiB region, because the memory allocated later
won't fall into the low 1MiB area.

This series includes two patches:
[1] x86/kdump: always reserve the low 1MiB when the crashkernel option
is specified
The low 1MiB region will always be reserved when the crashkernel
kernel command line option is specified, which ensures that the
memory allocated later won't fall into the low 1MiB area.

[2] x86/kdump: clean up all the code related to the backup region
Remove the backup region and clean up.

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1MiB region when the crashkernel option is only
specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Changes since v3:
[1] Improve the first patch's log

[2] Improve the third patch based on Eric's suggestions

Changes since v4:
[1] Correct some typos, and also improve the first patch's log

[2] Add a new function kexec_reserve_low_1MiB() in kernel/kexec_core.c
and which is called by reserve_real_mode(). (Suggested by Boris) 

Lianbo Jiang (2):
  x86/kdump: always reserve the low 1MiB when the crashkernel option is
specified
  x86/kdump: clean up all the code related to the backup region

 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 arch/x86/realmode/init.c   |  2 +
 include/linux/kexec.h  |  2 +
 kernel/kexec_core.c| 13 +
 8 files changed, 28 insertions(+), 162 deletions(-)

-- 
2.17.1

[PATCH 1/3 v4] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified

2019-10-17 Thread Lianbo Jiang

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Kdump kernel will reuse the first 640k region because of some reasons,
for example: the trampline and conventional PC system BIOS region may
require to allocate memory in this area. Obviously, kdump kernel will
also overwrite the first 640k region, therefore, kernel has to copy
the contents of the first 640k area to a backup area, which is done in
purgatory(), because vmcore may need the old memory. When vmcore is
dumped, kdump kernel will read the old memory from the backup area of
the first 640k area.

Basically, the main reason should be clear, kernel does not correctly
handle the first 640k region when SME is active, which causes that
kernel does not properly copy these old memory to the backup area in
purgatory(). Therefore, kdump kernel reads out the incorrect contents
from the backup area when dumping vmcore. Finally, the phenomenon is
as follow:

[root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values

  KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore  [PARTIAL DUMP]
CPUS: 128
DATE: Thu Sep 19 08:31:18 2019
  UPTIME: 00:01:21
LOAD AVERAGE: 0.16, 0.07, 0.02
   TASKS: 1343
NODENAME: amd-ethanol
 RELEASE: 5.3.0-rc7+
 VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
 MACHINE: x86_64  (2195 Mhz)
  MEMORY: 127.9 GB
   PANIC: "Kernel panic - not syncing: sysrq triggered crash"
 PID: 9789
 COMMAND: "bash"
TASK: "89711894ae80  [THREAD_INFO: 89711894ae80]"
 CPU: 83
   STATE: TASK_RUNNING (PANIC)

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

BTW: I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for SME
situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter.

To avoid the above error, when the crashkernel kernel command line
option is specified, lets reserve the remaining low 1MiB memory(
after reserving real mode memroy) so that the allocated memory does
not fall into the low 1MiB area, which makes us not to copy the first
640k content to a backup region in purgatory(). This indicates that
it does not need to be included in crash dumps or used for anything
execept the processor trampolines that must live in the low 1MiB.

In addition, also need to clean all the code related to the backup
region later.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/realmode/init.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..1f0492830f2c 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -34,6 +34,17 @@ void __init reserve_real_mode(void)
 
memblock_reserve(mem, size);
set_real_mode_mem(mem);
+
+#ifdef CONFIG_KEXEC_CORE
+   /*
+* When the crashkernel option is specified, only use the low
+* 1MiB for the real mode trampoline.
+*/
+   if (strstr(boot_command_line, "crashkernel=")) {
+   memblock_reserve(0, 1<<20);
+   pr_info("Reserving the low 1MiB of memory for crashkernel\n");
+   }
+#endif /* CONFIG_KEXEC_CORE */
 }
 
 static void __init setup_real_mode(void)
-- 
2.17.1

[PATCH 3/3 v4] x86/kdump: clean up all the code related to the backup region

2019-10-17 Thread Lianbo Jiang

When the crashkernel kernel command line option is specified, the
low 1MiB memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1MiB area, thereby, it's not
necessary to create a backup region and also no need to copy the first
640k content to a backup region.

Currently, the code related to the backup region can be safely removed,
so lets clean up.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 5 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   /* Details of backup region */
-   unsigned long backup_src_start;
-   unsigned long backup_src_sz;
-
-   /* Physical address of backup segment */
-   unsigned long backup_load_addr;
 
/* Core ELF header buffer */
void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include 
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..ef54b3ffb0f6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -173,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -217,6 +215,11 @@ static int elf_header_exclude_ranges(struct crash_mem 
*cmem)
 {
int ret = 0;
 
+   /* Exclude the low 1MiB because it is always reserved */
+   ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+   if (ret)
+   return ret;
+
/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (ret)
@@ -246,9 +249,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
unsigned long *sz)
 {
struct crash_mem *cmem;
-   Elf64_Ehdr *ehdr;
-   Elf64_Phdr *phdr;
-   int ret, i;
+   int ret;
 
cmem = fill_up_crash_elf_data();
if (!cmem)
@@ -267,22 +268,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
/* By default prepare 64bit headers */
ret =  crash_prepare_elf64_headers(cmem,
IS_ENABLED(CONFIG_X86_64), addr, sz);
-   if (ret)
-   goto out;
 
-   /*
-* If a range matches backup region, adjust offset to backup
-* segment.
-*/
-   ehdr = (Elf64_Ehdr *)*addr;
-   phdr = (Elf64_Phdr *)(ehdr + 1);
-   for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-   if (phdr->p_type == PT_LOAD &&
-   phdr->p_paddr == image->arch.backup_src_start &&
-   phdr->p_memsz == image->arch.backup_src_sz) {
-   phdr->p_offset = image->arch.backup_load_addr;
-   break;
-   }
 out:
vfree(cmem);
return ret;
@@ -321,19 +307,11 @@ static int memmap_exclude_ranges(struct kimage *image, 
struct crash_mem *cmem,
 unsigned long long mend)
 {
unsigned long start, end;
-   int ret = 0;
 
cmem->ranges[0].start = mstart;
cmem->ranges[0].end = mend;
cmem->nr_ranges = 1;
 
-   /* Exclude Backup region */
-   start = image->arch.backup_load_addr;
-   end = start + image->arch.backup_src_sz - 1;
-

[PATCH 0/3 v4] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-17 Thread Lianbo Jiang

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active(https://bugzilla.kernel.org/show_bug.cgi?id=204793).

The low 1MiB region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1MiB region, because the memory allocated later
won't fall into the low 1MiB area.

This series includes three patches:
[1] x86/kdump: always reserve the low 1MiB when the crashkernel option
is specified
The low 1MiB region will always be reserved when the crashkernel
kernel command line option is specified, which ensures that the
memory allocated later won't fall into the low 1MiB area.

[2] x86/kdump: remove the unused crash_copy_backup_region()
The crash_copy_backup_region() has never been used, so clean
up the redundant code.

[3] x86/kdump: clean up all the code related to the backup region
Remove the backup region and clean up.

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1MiB region when the crashkernel option is only
specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Changes since v3:
[1] Improve the first patch's log
[2] Improve the third patch based on Eric's suggestions

Lianbo Jiang (3):
  x86/kdump: always reserve the low 1MiB when the crashkernel option is
specified
  x86/kdump: remove the unused crash_copy_backup_region()
  x86/kdump: clean up all the code related to the backup region

 arch/x86/include/asm/crash.h   |  1 -
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 87 --
 arch/x86/kernel/machine_kexec_64.c | 47 
 arch/x86/purgatory/purgatory.c | 19 ---
 arch/x86/realmode/init.c   | 11 
 7 files changed, 22 insertions(+), 163 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/3 v4] x86/kdump: remove the unused crash_copy_backup_region()

2019-10-17 Thread Lianbo Jiang

The crash_copy_backup_region() has never been used, so clean
up the redundant code.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/crash.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index 0acf5ee45a21..089b2850f9d1 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -3,7 +3,6 @@
 #define _ASM_X86_CRASH_H
 
 int crash_load_segments(struct kimage *image);
-int crash_copy_backup_region(struct kimage *image);
 int crash_setup_memmap_entries(struct kimage *image,
struct boot_params *params);
 void crash_smp_send_stop(void);
-- 
2.17.1

[PATCH 2/3 v3] x86/kdump cleanup: remove the unused crash_copy_backup_region()

2019-10-11 Thread Lianbo Jiang

The crash_copy_backup_region() has never been used, so clean
up the redundant code.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/crash.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/include/asm/crash.h b/arch/x86/include/asm/crash.h
index 0acf5ee45a21..089b2850f9d1 100644
--- a/arch/x86/include/asm/crash.h
+++ b/arch/x86/include/asm/crash.h
@@ -3,7 +3,6 @@
 #define _ASM_X86_CRASH_H
 
 int crash_load_segments(struct kimage *image);
-int crash_copy_backup_region(struct kimage *image);
 int crash_setup_memmap_entries(struct kimage *image,
struct boot_params *params);
 void crash_smp_send_stop(void);
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/3 v3] x86/kdump: clean up all the code related to the backup region

2019-10-11 Thread Lianbo Jiang

When the crashkernel kernel command line option is specified, the
low 1MiB memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1MiB area, thereby, it's not
necessary to create a backup region and also no need to copy the first
640k content to a backup region.

Currently, the code related to the backup region can be safely removed,
so lets clean up.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 91 ++
 arch/x86/kernel/machine_kexec_64.c | 47 ---
 arch/x86/purgatory/purgatory.c | 19 ---
 5 files changed, 16 insertions(+), 161 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   /* Details of backup region */
-   unsigned long backup_src_start;
-   unsigned long backup_src_sz;
-
-   /* Physical address of backup segment */
-   unsigned long backup_load_addr;
 
/* Core ELF header buffer */
void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include 
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..cc5774fc84c0 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -173,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
@@ -234,9 +232,15 @@ static int prepare_elf64_ram_headers_callback(struct 
resource *res, void *arg)
 {
struct crash_mem *cmem = arg;
 
-   cmem->ranges[cmem->nr_ranges].start = res->start;
-   cmem->ranges[cmem->nr_ranges].end = res->end;
-   cmem->nr_ranges++;
+   if (res->start >= SZ_1M) {
+   cmem->ranges[cmem->nr_ranges].start = res->start;
+   cmem->ranges[cmem->nr_ranges].end = res->end;
+   cmem->nr_ranges++;
+   } else if (res->end > SZ_1M) {
+   cmem->ranges[cmem->nr_ranges].start = SZ_1M;
+   cmem->ranges[cmem->nr_ranges].end = res->end;
+   cmem->nr_ranges++;
+   }
 
return 0;
 }
@@ -246,9 +250,7 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
unsigned long *sz)
 {
struct crash_mem *cmem;
-   Elf64_Ehdr *ehdr;
-   Elf64_Phdr *phdr;
-   int ret, i;
+   int ret;
 
cmem = fill_up_crash_elf_data();
if (!cmem)
@@ -270,19 +272,6 @@ static int prepare_elf_headers(struct kimage *image, void 
**addr,
if (ret)
goto out;
 
-   /*
-* If a range matches backup region, adjust offset to backup
-* segment.
-*/
-   ehdr = (Elf64_Ehdr *)*addr;
-   phdr = (Elf64_Phdr *)(ehdr + 1);
-   for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-   if (phdr->p_type == PT_LOAD &&
-   phdr->p_paddr == image->arch.backup_src_start &&
-   phdr->p_memsz == image->arch.backup_src_sz) {
-   phdr->p_offset = image->arch.backup_load_addr;
-   break;
-   }
 out:
vfree(cmem);
return ret;
@@ -321,19 +310,11 @@ static int memmap_exclude_ranges(struct kimage *image, 
struct crash_mem *cmem,
 unsigned long long mend)
 {
unsigned long start, end;
-   int ret = 0;
 
cmem->ranges[0].start = mstart;
cmem->ranges[0].end = mend;

[PATCH 0/3 v3] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-11 Thread Lianbo Jiang

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active.

The low 1MiB region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1MiB region, because the memory allocated later
won't fall into the low 1MiB area.

This series includes three patches:
[1] Fix 'kmem -s' reported an invalid freepointer when SME was active
The low 1MiB region will always be reserved when the crashkernel
kernel command line option is specified, which ensures that the
memory allocated later won't fall into the low 1MiB area.

[2] x86/kdump cleanup: remove the unused crash_copy_backup_region()
The crash_copy_backup_region() has never been used, so clean
up the redundant code.

[3] x86/kdump: clean up all the code related to the backup region
Remove the backup region and clean up.

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1MiB region when the crashkernel option is only
specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Lianbo Jiang (3):
  x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was
active
  x86/kdump cleanup: remove the unused crash_copy_backup_region()
  x86/kdump: clean up all the code related to the backup region

 arch/x86/include/asm/crash.h   |  1 -
 arch/x86/include/asm/kexec.h   | 10 
 arch/x86/include/asm/purgatory.h   | 10 
 arch/x86/kernel/crash.c| 91 ++
 arch/x86/kernel/machine_kexec_64.c | 47 ---
 arch/x86/purgatory/purgatory.c | 19 ---
 arch/x86/realmode/init.c   | 11 
 7 files changed, 27 insertions(+), 162 deletions(-)

-- 
2.17.1

[PATCH 1/3 v3] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-11 Thread Lianbo Jiang

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Kdump kernel will reuse the first 640k region because of some reasons,
for example: the trampline and conventional PC system BIOS region may
require to allocate memory in this area. Obviously, kdump kernel will
also overwrite the first 640k region, therefore, kernel has to copy
the contents of the first 640k area to a backup area, which is done in
purgatory(), because vmcore may need the old memory. When vmcore is
dumped, kdump kernel will read the old memory from the backup area of
the first 640k area.

Basically, the main reason should be clear, kernel does not correctly
handle the first 640k region when SME is active, which causes that
kernel does not properly copy these old memory to the backup area in
purgatory(). Therefore, kdump kernel reads out the incorrect contents
from the backup area when dumping vmcore. Finally, the phenomenon is
as follow:

[root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values

  KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore  [PARTIAL DUMP]
CPUS: 128
DATE: Thu Sep 19 08:31:18 2019
  UPTIME: 00:01:21
LOAD AVERAGE: 0.16, 0.07, 0.02
   TASKS: 1343
NODENAME: amd-ethanol
 RELEASE: 5.3.0-rc7+
 VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
 MACHINE: x86_64  (2195 Mhz)
  MEMORY: 127.9 GB
   PANIC: "Kernel panic - not syncing: sysrq triggered crash"
 PID: 9789
 COMMAND: "bash"
TASK: "89711894ae80  [THREAD_INFO: 89711894ae80]"
 CPU: 83
   STATE: TASK_RUNNING (PANIC)

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

BTW: I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for SME
situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter. To avoid
the above error, when the crashkernel kernel command line option is
specified, lets reserve the remain low 1MiB memory(after reserving
real mode memroy) so that the allocated memory does not fall into the
low 1MiB area, which makes us not to copy the first 640k content to a
backup region in purgatory().

In addition, also need to clean all the code related to the backup
region later.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/realmode/init.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..bf4c8ffc5ed9 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -34,6 +34,17 @@ void __init reserve_real_mode(void)
 
memblock_reserve(mem, size);
set_real_mode_mem(mem);
+
+#ifdef CONFIG_KEXEC_CORE
+   /*
+* When the crashkernel option is specified, only use the low
+* 1MiB for the real mode trampoline.
+*/
+   if (strstr(boot_command_line, "crashkernel=")) {
+   memblock_reserve(0, SZ_1M);
+   pr_info("Reserving low 1MiB of memory for crashkernel\n");
+   }
+#endif /* CONFIG_KEXEC_CORE */
 }
 
 static void __init setup_real_mode(void)
-- 
2.17.1

[PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-07 Thread Lianbo Jiang

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Kdump kernel will reuse the first 640k region because of some reasons,
for example: the trampline and conventional PC system BIOS region may
require to allocate memory in this area. Obviously, kdump kernel will
also overwrite the first 640k region, therefore, kernel has to copy
the contents of the first 640k area to a backup area, which is done in
purgatory(), because vmcore may need the old memory. When vmcore is
dumped, kdump kernel will read the old memory from the backup area of
the first 640k area.

Basically, the main reason should be clear, kernel does not correctly
handle the first 640k region when SME is active, which causes that
kernel does not properly copy these old memory to the backup area in
purgatory(). Therefore, kdump kernel reads out the incorrect contents
from the backup area when dumping vmcore. Finally, the phenomenon is
as follow:

[root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values

  KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore  [PARTIAL DUMP]
CPUS: 128
DATE: Thu Sep 19 08:31:18 2019
  UPTIME: 00:01:21
LOAD AVERAGE: 0.16, 0.07, 0.02
   TASKS: 1343
NODENAME: amd-ethanol
 RELEASE: 5.3.0-rc7+
 VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
 MACHINE: x86_64  (2195 Mhz)
  MEMORY: 127.9 GB
   PANIC: "Kernel panic - not syncing: sysrq triggered crash"
 PID: 9789
 COMMAND: "bash"
TASK: "89711894ae80  [THREAD_INFO: 89711894ae80]"
 CPU: 83
   STATE: TASK_RUNNING (PANIC)

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
freepointer:a6086ac099f0c5a4
crash>

BTW: I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for SME
situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter. To avoid
the above error, lets occupy the remain memory of the first 640k region
(expect for the trampoline and real mode) so that the allocated memory
does not fall into the first 640k area when SME is active, which makes
us not to worry about whether kernel can correctly copy the contents of
the first 640k area to a backup region in the purgatory().

Signed-off-by: Lianbo Jiang 
---
Changes since v1:
1. Improve patch log
2. Change the checking condition from sme_active() to sme_active()
   && strstr(boot_command_line, "crashkernel=")

 arch/x86/kernel/setup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 77ea96b794bd..bdb1a02a84fd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)
 
reserve_real_mode();
 
+   if (sme_active() && strstr(boot_command_line, "crashkernel="))
+   memblock_reserve(0, 640*1024);
+
trim_platform_memory_ranges();
trim_low_memory_range();
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 4/4 v2] Limit the size of vmcore-dmesg.txt to 2G

2019-08-23 Thread Lianbo Jiang

With some corrupted vmcore files, the vmcore-dmesg.txt file may grow
forever till the kdump disk becomes full, and also probably causes
the disk error messages as follow:
...
sd 0:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] tag#6 CDB: Read(10) 28 00 08 06 4c 98 00 00 08 00
blk_update_request: I/O error, dev sda, sector 134630552
sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] tag#7 CDB: Read(10) 28 00 08 06 4c 98 00 00 08 00
blk_update_request: I/O error, dev sda, sector 134630552
...

If vmcore-dmesg.txt occupies the whole disk, the vmcore can not be
saved, this is also a problem.

Lets limit the size of vmcore-dmesg.txt to avoid such problems.

Signed-off-by: Lianbo Jiang 
---
 vmcore-dmesg/vmcore-dmesg.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/vmcore-dmesg/vmcore-dmesg.c b/vmcore-dmesg/vmcore-dmesg.c
index fe7df8ec372c..81c2a58c9d86 100644
--- a/vmcore-dmesg/vmcore-dmesg.c
+++ b/vmcore-dmesg/vmcore-dmesg.c
@@ -5,9 +5,19 @@ typedef Elf32_Nhdr Elf_Nhdr;
 
 extern const char *fname;
 
+/* stole this macro from kernel printk.c */
+#define LOG_BUF_LEN_MAX (uint32_t)(1 << 31)
+
 static void write_to_stdout(char *buf, unsigned int nr)
 {
ssize_t ret;
+   static uint32_t n_bytes = 0;
+
+   n_bytes += nr;
+   if (n_bytes > LOG_BUF_LEN_MAX) {
+   fprintf(stderr, "The vmcore-dmesg.txt over 2G in size is not 
supported.\n");
+   exit(53);
+   }
 
ret = write(STDOUT_FILENO, buf, nr);
if (ret != nr) {
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/4 v2] Cleanup: remove the read_elf_kcore()

2019-08-23 Thread Lianbo Jiang

Here, no need to wrap the read_elf() again, lets invoke it directly.
So remove the read_elf_kcore() and clean up redundant code.

Signed-off-by: Lianbo Jiang 
---
 kexec/arch/arm64/kexec-arm64.c |  2 +-
 util_lib/elf_info.c| 15 ++-
 util_lib/include/elf_info.h|  2 +-
 3 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index eb3a3a37307c..6ad3b0a134b3 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -889,7 +889,7 @@ int get_phys_base_from_pt_load(unsigned long *phys_offset)
return EFAILED;
}
 
-   read_elf_kcore(fd);
+   read_elf(fd);
 
for (i = 0; get_pt_load(i,
_start, NULL, _start, NULL);
diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
index 90a3b21662e7..d9397ecd8626 100644
--- a/util_lib/elf_info.c
+++ b/util_lib/elf_info.c
@@ -764,7 +764,7 @@ static void dump_dmesg(int fd)
dump_dmesg_legacy(fd);
 }
 
-static int read_elf(int fd)
+int read_elf(int fd)
 {
int ret;
 
@@ -824,24 +824,13 @@ int read_elf_vmcore(int fd)
return 0;
 }
 
-int read_elf_kcore(int fd)
-{
-   int ret;
-
-   ret = read_elf(fd);
-   if (ret != 0)
-   return ret;
-
-   return 0;
-}
-
 int read_phys_offset_elf_kcore(int fd, unsigned long *phys_off)
 {
int ret;
 
*phys_off = UINT64_MAX;
 
-   ret = read_elf_kcore(fd);
+   ret = read_elf(fd);
if (!ret) {
/* If we have a valid 'PHYS_OFFSET' by now,
 * return it to the caller now.
diff --git a/util_lib/include/elf_info.h b/util_lib/include/elf_info.h
index 1a4debd2d4ba..c328a1b0ecf2 100644
--- a/util_lib/include/elf_info.h
+++ b/util_lib/include/elf_info.h
@@ -29,7 +29,7 @@ int get_pt_load(int idx,
unsigned long long *virt_start,
unsigned long long *virt_end);
 int read_phys_offset_elf_kcore(int fd, unsigned long *phys_off);
-int read_elf_kcore(int fd);
+int read_elf(int fd);
 int read_elf_vmcore(int fd);
 
 #endif /* ELF_INFO_H */
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/4 v2] Limit the size of vmcore-dmesg.txt to 2G

2019-08-23 Thread Lianbo Jiang

[PATCH 1/4] Cleanup: remove the read_elf_kcore()
Here, no need to wrap the read_elf() again, lets invoke it directly.
So remove the read_elf_kcore() and clean up redundant code.

[PATCH 2/4] Fix an error definition about the variable 'fname'
The variable 'fname' is mistakenly defined two twice, the first definition
is in the vmcore-dmesg.c, and the second definition is in the elf_info.c.
That is confused and incorrect although it's a static type, because the
value of variable 'fname' is not assigned(set) in elf_info.c. Anyway, its
value will be always 'null' when printing an error information.

[PATCH 3/4] Cleanup: move it back from util_lib/elf_info.c
Some code related to vmcore-dmesg.c is put into the util_lib, which
is not very reasonable, so lets move it back and tidy up those code.
In addition, that will also help to limit the size of vmcore-dmesg.txt.

[PATCH 4/4] Limit the size of vmcore-dmesg.txt to 2G
With some corrupted vmcore files, the vmcore-dmesg.txt file may
grow forever till the kdump disk becomes full. Lets limit the
size of vmcore-dmesg.txt to avoid such problems.

BTW: I tested this patch series on x86 64 and arm64, it also worked well.

Changes since v1:
[1] split them([patch 1/4] and [patch 2/4]) into a separate patch.
[2] remove a typedef definition for handler.
[3] remove some changes of variable 'fname' and fix its error.

Lianbo Jiang (4):
  Cleanup: remove the read_elf_kcore()
  Fix an error definition about the variable 'fname'
  Cleanup: move it back from util_lib/elf_info.c
  Limit the size of vmcore-dmesg.txt to 2G

 kexec/arch/arm64/kexec-arm64.c |  2 +-
 util_lib/elf_info.c| 65 --
 util_lib/include/elf_info.h|  4 +--
 vmcore-dmesg/vmcore-dmesg.c| 42 --
 4 files changed, 57 insertions(+), 56 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/4 v2] Cleanup: move it back from util_lib/elf_info.c

2019-08-23 Thread Lianbo Jiang

Some code related to vmcore-dmesg.c is put into the util_lib, which
is not very reasonable, so lets move it back and tidy up those code.

In addition, that will also help to limit the size of vmcore-dmesg.txt
in vmcore-dmesg.c instead of elf_info.c.

Signed-off-by: Lianbo Jiang 
---
 util_lib/elf_info.c | 48 +
 util_lib/include/elf_info.h |  2 +-
 vmcore-dmesg/vmcore-dmesg.c | 30 ++-
 3 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
index 5d0efaafab53..2bce5cb1713c 100644
--- a/util_lib/elf_info.c
+++ b/util_lib/elf_info.c
@@ -531,19 +531,7 @@ static int32_t read_file_s32(int fd, uint64_t addr)
return read_file_u32(fd, addr);
 }
 
-static void write_to_stdout(char *buf, unsigned int nr)
-{
-   ssize_t ret;
-
-   ret = write(STDOUT_FILENO, buf, nr);
-   if (ret != nr) {
-   fprintf(stderr, "Failed to write out the dmesg log buffer!:"
-   " %s\n", strerror(errno));
-   exit(54);
-   }
-}
-
-static void dump_dmesg_legacy(int fd)
+static void dump_dmesg_legacy(int fd, void (*handler)(char*, unsigned int))
 {
uint64_t log_buf, log_buf_offset;
unsigned log_end, logged_chars, log_end_wrapped;
@@ -604,7 +592,8 @@ static void dump_dmesg_legacy(int fd)
 */
logged_chars = log_end < log_buf_len ? log_end : log_buf_len;
 
-   write_to_stdout(buf + (log_buf_len - logged_chars), logged_chars);
+   if (handler)
+   handler(buf + (log_buf_len - logged_chars), logged_chars);
 }
 
 static inline uint16_t struct_val_u16(char *ptr, unsigned int offset)
@@ -623,7 +612,7 @@ static inline uint64_t struct_val_u64(char *ptr, unsigned 
int offset)
 }
 
 /* Read headers of log records and dump accordingly */
-static void dump_dmesg_structured(int fd)
+static void dump_dmesg_structured(int fd, void (*handler)(char*, unsigned int))
 {
 #define OUT_BUF_SIZE   4096
uint64_t log_buf, log_buf_offset, ts_nsec;
@@ -733,7 +722,8 @@ static void dump_dmesg_structured(int fd)
out_buf[len++] = c;
 
if (len >= OUT_BUF_SIZE - 64) {
-   write_to_stdout(out_buf, len);
+   if (handler)
+   handler(out_buf, len);
len = 0;
}
}
@@ -752,16 +742,16 @@ static void dump_dmesg_structured(int fd)
current_idx += loglen;
}
free(buf);
-   if (len)
-   write_to_stdout(out_buf, len);
+   if (len && handler)
+   handler(out_buf, len);
 }
 
-static void dump_dmesg(int fd)
+void dump_dmesg(int fd, void (*handler)(char*, unsigned int))
 {
if (log_first_idx_vaddr)
-   dump_dmesg_structured(fd);
+   dump_dmesg_structured(fd, handler);
else
-   dump_dmesg_legacy(fd);
+   dump_dmesg_legacy(fd, handler);
 }
 
 int read_elf(int fd)
@@ -808,22 +798,6 @@ int read_elf(int fd)
return 0;
 }
 
-int read_elf_vmcore(int fd)
-{
-   int ret;
-
-   ret = read_elf(fd);
-   if (ret > 0) {
-   fprintf(stderr, "Unable to read ELF information"
-   " from vmcore\n");
-   return ret;
-   }
-
-   dump_dmesg(fd);
-
-   return 0;
-}
-
 int read_phys_offset_elf_kcore(int fd, unsigned long *phys_off)
 {
int ret;
diff --git a/util_lib/include/elf_info.h b/util_lib/include/elf_info.h
index c328a1b0ecf2..4bc9279ba603 100644
--- a/util_lib/include/elf_info.h
+++ b/util_lib/include/elf_info.h
@@ -30,6 +30,6 @@ int get_pt_load(int idx,
unsigned long long *virt_end);
 int read_phys_offset_elf_kcore(int fd, unsigned long *phys_off);
 int read_elf(int fd);
-int read_elf_vmcore(int fd);
+void dump_dmesg(int fd, void (*handler)(char*, unsigned int));
 
 #endif /* ELF_INFO_H */
diff --git a/vmcore-dmesg/vmcore-dmesg.c b/vmcore-dmesg/vmcore-dmesg.c
index bebc348a657e..fe7df8ec372c 100644
--- a/vmcore-dmesg/vmcore-dmesg.c
+++ b/vmcore-dmesg/vmcore-dmesg.c
@@ -5,6 +5,34 @@ typedef Elf32_Nhdr Elf_Nhdr;
 
 extern const char *fname;
 
+static void write_to_stdout(char *buf, unsigned int nr)
+{
+   ssize_t ret;
+
+   ret = write(STDOUT_FILENO, buf, nr);
+   if (ret != nr) {
+   fprintf(stderr, "Failed to write out the dmesg log buffer!:"
+   " %s\n", strerror(errno));
+   exit(54);
+   }
+}
+
+static int read_vmcore_dmesg(int fd, void (*handler)(char*, unsigned int))
+{
+   int ret;
+
+   ret = read_elf(fd);
+   if (ret > 0) {
+   fprintf(stderr, "Unable to read ELF information"
+   " from vmcore\n");
+   return ret;
+

[PATCH 2/4 v2] Fix an error definition about the variable 'fname'

2019-08-23 Thread Lianbo Jiang

The variable 'fname' is mistakenly defined two twice, the first definition
is in the vmcore-dmesg.c, and the second definition is in the elf_info.c.
That is confused and incorrect although it's a static type, because the
value of variable 'fname' is not assigned(set) in elf_info.c. Anyway, its
value will be always 'null' when printing an error information.

Signed-off-by: Lianbo Jiang 
---
 util_lib/elf_info.c | 2 +-
 vmcore-dmesg/vmcore-dmesg.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
index d9397ecd8626..5d0efaafab53 100644
--- a/util_lib/elf_info.c
+++ b/util_lib/elf_info.c
@@ -20,7 +20,7 @@
 /* The 32bit and 64bit note headers make it clear we don't care */
 typedef Elf32_Nhdr Elf_Nhdr;
 
-static const char *fname;
+const char *fname;
 static Elf64_Ehdr ehdr;
 static Elf64_Phdr *phdr;
 static int num_pt_loads;
diff --git a/vmcore-dmesg/vmcore-dmesg.c b/vmcore-dmesg/vmcore-dmesg.c
index 7a386b380291..bebc348a657e 100644
--- a/vmcore-dmesg/vmcore-dmesg.c
+++ b/vmcore-dmesg/vmcore-dmesg.c
@@ -3,7 +3,7 @@
 /* The 32bit and 64bit note headers make it clear we don't care */
 typedef Elf32_Nhdr Elf_Nhdr;
 
-static const char *fname;
+extern const char *fname;
 
 int main(int argc, char **argv)
 {
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2] Limit the size of vmcore-dmesg.txt to 2G

2019-08-14 Thread Lianbo Jiang

With some corrupted vmcore files, the vmcore-dmesg.txt file may grow
forever till the kdump disk becomes full, and also probably causes
the disk error messages as follow:
...
sd 0:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] tag#6 CDB: Read(10) 28 00 08 06 4c 98 00 00 08 00
blk_update_request: I/O error, dev sda, sector 134630552
sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET 
driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] tag#7 CDB: Read(10) 28 00 08 06 4c 98 00 00 08 00
blk_update_request: I/O error, dev sda, sector 134630552
...

If vmcore-dmesg.txt occupies the whole disk, the vmcore can not be
saved, this is also a problem.

Lets limit the size of vmcore-dmesg.txt to avoid such problems.

Signed-off-by: Lianbo Jiang 
---
 vmcore-dmesg/vmcore-dmesg.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/vmcore-dmesg/vmcore-dmesg.c b/vmcore-dmesg/vmcore-dmesg.c
index ff0d540c9130..5ada3566972b 100644
--- a/vmcore-dmesg/vmcore-dmesg.c
+++ b/vmcore-dmesg/vmcore-dmesg.c
@@ -1,8 +1,18 @@
 #include 
 
+/* stole this macro from kernel printk.c */
+#define LOG_BUF_LEN_MAX (uint32_t)(1 << 31)
+
 static void write_to_stdout(char *buf, unsigned int nr)
 {
ssize_t ret;
+   static uint32_t n_bytes = 0;
+
+   n_bytes += nr;
+   if (n_bytes > LOG_BUF_LEN_MAX) {
+   fprintf(stderr, "The vmcore-dmesg.txt over 2G in size is not 
supported.\n");
+   exit(55);
+   }
 
ret = write(STDOUT_FILENO, buf, nr);
if (ret != nr) {
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2] cleanup: move it back from util_lib/elf_info.c

2019-08-14 Thread Lianbo Jiang

Some code related to vmcore-dmesg.c is put into the util_lib, which
is not very reasonable, so lets move it back and tidy up those code.

In addition, that will also help to limit the size of vmcore-dmesg.txt.

Signed-off-by: Lianbo Jiang 
---
 kexec/arch/arm64/kexec-arm64.c |  2 +-
 util_lib/elf_info.c| 73 --
 util_lib/include/elf_info.h|  8 +++-
 vmcore-dmesg/vmcore-dmesg.c| 44 +---
 4 files changed, 61 insertions(+), 66 deletions(-)

diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index eb3a3a37307c..6ad3b0a134b3 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -889,7 +889,7 @@ int get_phys_base_from_pt_load(unsigned long *phys_offset)
return EFAILED;
}
 
-   read_elf_kcore(fd);
+   read_elf(fd);
 
for (i = 0; get_pt_load(i,
_start, NULL, _start, NULL);
diff --git a/util_lib/elf_info.c b/util_lib/elf_info.c
index 90a3b21662e7..2f254e972721 100644
--- a/util_lib/elf_info.c
+++ b/util_lib/elf_info.c
@@ -20,7 +20,6 @@
 /* The 32bit and 64bit note headers make it clear we don't care */
 typedef Elf32_Nhdr Elf_Nhdr;
 
-static const char *fname;
 static Elf64_Ehdr ehdr;
 static Elf64_Phdr *phdr;
 static int num_pt_loads;
@@ -120,8 +119,8 @@ void read_elf32(int fd)
 
ret = pread(fd, , sizeof(ehdr32), 0);
if (ret != sizeof(ehdr32)) {
-   fprintf(stderr, "Read of Elf header from %s failed: %s\n",
-   fname, strerror(errno));
+   fprintf(stderr, "Read of Elf header failed in %s: %s\n",
+   __func__, strerror(errno));
exit(10);
}
 
@@ -193,8 +192,8 @@ void read_elf64(int fd)
 
ret = pread(fd, , sizeof(ehdr64), 0);
if (ret < 0 || (size_t)ret != sizeof(ehdr)) {
-   fprintf(stderr, "Read of Elf header from %s failed: %s\n",
-   fname, strerror(errno));
+   fprintf(stderr, "Read of Elf header failed in %s: %s\n",
+   __func__, strerror(errno));
exit(10);
}
 
@@ -531,19 +530,7 @@ static int32_t read_file_s32(int fd, uint64_t addr)
return read_file_u32(fd, addr);
 }
 
-static void write_to_stdout(char *buf, unsigned int nr)
-{
-   ssize_t ret;
-
-   ret = write(STDOUT_FILENO, buf, nr);
-   if (ret != nr) {
-   fprintf(stderr, "Failed to write out the dmesg log buffer!:"
-   " %s\n", strerror(errno));
-   exit(54);
-   }
-}
-
-static void dump_dmesg_legacy(int fd)
+void dump_dmesg_legacy(int fd, handler_t handler)
 {
uint64_t log_buf, log_buf_offset;
unsigned log_end, logged_chars, log_end_wrapped;
@@ -604,7 +591,7 @@ static void dump_dmesg_legacy(int fd)
 */
logged_chars = log_end < log_buf_len ? log_end : log_buf_len;
 
-   write_to_stdout(buf + (log_buf_len - logged_chars), logged_chars);
+   handler(buf + (log_buf_len - logged_chars), logged_chars);
 }
 
 static inline uint16_t struct_val_u16(char *ptr, unsigned int offset)
@@ -623,7 +610,7 @@ static inline uint64_t struct_val_u64(char *ptr, unsigned 
int offset)
 }
 
 /* Read headers of log records and dump accordingly */
-static void dump_dmesg_structured(int fd)
+void dump_dmesg_structured(int fd, handler_t handler)
 {
 #define OUT_BUF_SIZE   4096
uint64_t log_buf, log_buf_offset, ts_nsec;
@@ -733,7 +720,7 @@ static void dump_dmesg_structured(int fd)
out_buf[len++] = c;
 
if (len >= OUT_BUF_SIZE - 64) {
-   write_to_stdout(out_buf, len);
+   handler(out_buf, len);
len = 0;
}
}
@@ -753,25 +740,24 @@ static void dump_dmesg_structured(int fd)
}
free(buf);
if (len)
-   write_to_stdout(out_buf, len);
+   handler(out_buf, len);
 }
 
-static void dump_dmesg(int fd)
+int check_log_first_idx_vaddr(void)
 {
if (log_first_idx_vaddr)
-   dump_dmesg_structured(fd);
-   else
-   dump_dmesg_legacy(fd);
+   return 1;
+
+   return 0;
 }
 
-static int read_elf(int fd)
+int read_elf(int fd)
 {
int ret;
 
ret = pread(fd, ehdr.e_ident, EI_NIDENT, 0);
if (ret != EI_NIDENT) {
-   fprintf(stderr, "Read of e_ident from %s failed: %s\n",
-   fname, strerror(errno));
+   fprintf(stderr, "Read of e_ident failed: %s\n", 
strerror(errno));
return 3;
}
if (memcmp(ehdr.e_ident, ELFMAG, SELFMAG) != 0) {
@@ -808,40 +794,13 @@ static int read_elf(int fd)
return 0;
 }
 
-int read

[PATCH 0/2] Limit the size of vmcore-dmesg.txt to 2G

2019-08-14 Thread Lianbo Jiang

[PATCH 1/2] cleanup: move it back from util_lib/elf_info.c
Some code related to vmcore-dmesg.c is put into the util_lib, which
is not very reasonable, so lets move it back and tidy up those code.

In addition, that will also help to limit the size of vmcore-dmesg.txt.

[PATCH 2/2] Limit the size of vmcore-dmesg.txt to 2G
With some corrupted vmcore files, the vmcore-dmesg.txt file may
grow forever till the kdump disk becomes full. Lets limit the
size of vmcore-dmesg.txt to avoid such problems.

BTW: I tested this patch on x86 64 and arm64, it also worked well.

Lianbo Jiang (2):
  cleanup: move it back from util_lib/elf_info.c
  Limit the size of vmcore-dmesg.txt to 2G

 kexec/arch/arm64/kexec-arm64.c |  2 +-
 util_lib/elf_info.c| 73 --
 util_lib/include/elf_info.h|  8 +++-
 vmcore-dmesg/vmcore-dmesg.c| 54 ++---
 4 files changed, 71 insertions(+), 66 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/3 v3] x86/kexec: Set the C-bit in the identity map page table when SEV is active

2019-04-30 Thread Lianbo Jiang

When SEV is active, the second kernel image is loaded into the
encrypted memory. Lets make sure that when kexec builds the
identity mapping page table it adds the memory encryption mask(C-bit).

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index f60611531d17..11fe352f7344 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -56,6 +56,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
pte_t *pte;
unsigned long vaddr, paddr;
int result = -ENOMEM;
+   pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
 
vaddr = (unsigned long)relocate_kernel;
paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
@@ -92,7 +93,11 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
+
+   if (sev_active())
+   prot = PAGE_KERNEL_EXEC;
+
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
return 0;
 err:
return result;
@@ -129,6 +134,11 @@ static int init_pgtable(struct kimage *image, unsigned 
long start_pgtable)
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
 
+   if (sev_active()) {
+   info.page_flag |= _PAGE_ENC;
+   info.kernpg_flag = _KERNPG_TABLE;
+   }
+
if (direct_gbpages)
info.direct_gbpages = true;
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/3 v3] x86/kexec: Do not map the kexec area as decrypted when SEV is active

2019-04-30 Thread Lianbo Jiang

When a virtual machine panic, also need to dump its memory for analysis.
But, for the SEV virtual machine, the memory is encrypted. To support
the SEV kdump, these changes would be necessary, otherwise, it will not
work.

Lets consider the following situations:

[1] How to load the images(kernel and initrd) when SEV is enabled in the
first kernel?

Based on the amd-memory-encryption.txt and SEV's patch series, the
boot images must be encrypted before guest(VM) can be booted(Please
see Secure Encrypted Virutualization Key Management 'Launching a
guest(usage flow)'). Naturally use the similar way to load the images
(kernel and initrd) to the crash reserved areas, and these areas are
encrypted when SEV is active.

That is to say, when SEV is active in the first kernel, need to load
the kernel and initrd to the encrypted areas, so i made the following
changes:

[a] Do not map the kexec area as decrypted when SEV is active.
Currently, the arch_kexec_post_{alloc,free}_pages() unconditionally
maps the kexec areas as decrypted. Obviously, for the SEV case, it
can not work well, need to improve them.

[b] Set the C-bit in the identity map page table when SEV is active.
Because the second kernel images(kernel and initrd) are loaded to
the encrypted areas, in order to correctly access these encrypted
memory(pages), need to set the C-bit in the identity mapping page
table when kexec builds the identity mapping page table.

[2] How to dump the old memory in the second kernel?

Here, it is similar to the SME kdump, if SEV was enabled in the first
kernel, the old memory is also encrypted, the old memory has to be
remapped with memory encryption mask in order to access it properly.

[a] The ioremap_encrypted() is still necessary.
Used to remap the old memory with memory encryption mask.

[b] Enable dumping encrypted memory when SEV was active.
Because the whole memory is encrypted in the first kernel when SEV
is enabled, that is to say, the notes and elfcorehdr are also
encrypted, and they are also saved to the encrypted memory.
Following commit 992b649a3f01 ("kdump, proc/vmcore: Enable kdumping
encrypted memory with SME enabled"), both SME and SEV cases need to
be considered and modified correctly.

As above mentioned, currently, the arch_kexec_post_{alloc,free}_pages()
unconditionally maps the kexec area as decrypted. Lets make sure that
arch_kexec_post_{alloc,free}_pages() does not clear the memory encryption
mask from the kexec area when SEV is active.

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index ceba408ea982..f60611531d17 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -559,18 +559,33 @@ void arch_kexec_unprotect_crashkres(void)
kexec_mark_crashkres(false);
 }
 
+/*
+ * During a traditional boot under SME, SME will encrypt the kernel,
+ * so the SME kexec kernel also needs to be un-encrypted in order to
+ * replicate a normal SME boot.
+ * During a traditional boot under SEV, the kernel has already been
+ * loaded encrypted, so the SEV kexec kernel needs to be encrypted in
+ * order to replicate a normal SEV boot.
+ */
 int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
 {
+   if (sev_active())
+   return 0;
+
/*
 * If SME is active we need to be sure that kexec pages are
 * not encrypted because when we boot to the new kernel the
 * pages won't be accessed encrypted (initially).
 */
return set_memory_decrypted((unsigned long)vaddr, pages);
+
 }
 
 void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
 {
+   if (sev_active())
+   return;
+
/*
 * If SME is active we need to reset the pages back to being
 * an encrypted mapping before freeing them.
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/3 v3] kdump, proc/vmcore: Enable dumping encrypted memory when SEV was active

2019-04-30 Thread Lianbo Jiang

In the kdump kernel, the memory of the first kernel needs to be dumped
into the vmcore file.

It is similar to the SME kdump, if SEV was enabled in the first kernel,
the old memory has to be remapped with memory encryption mask in order
to access it properly. Following commit 992b649a3f01 ("kdump, proc/vmcore:
Enable kdumping encrypted memory with SME enabled") took care of the
SME case but it uses sme_active() which checks for SME only. Lets use
the mem_encrypt_active() which returns true when either of them are
active.

Unlike the SME, the second kernel images(kernel and initrd) are loaded
to the encrypted memory when SEV is active, hence the kernel elf header
must be remapped as encrypted in order to access it properly.

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 fs/proc/vmcore.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 3fe90443c1bb..cda6c1922e4f 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -165,7 +165,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
  */
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, false);
+   return read_from_oldmem(buf, count, ppos, 0, sev_active());
 }
 
 /*
@@ -173,7 +173,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 
*ppos)
  */
 ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, sme_active());
+   return read_from_oldmem(buf, count, ppos, 0, mem_encrypt_active());
 }
 
 /*
@@ -373,7 +373,7 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, 
loff_t *fpos,
buflen);
start = m->paddr + *fpos - m->offset;
tmp = read_from_oldmem(buffer, tsz, ,
-  userbuf, sme_active());
+  userbuf, mem_encrypt_active());
if (tmp < 0)
return tmp;
buflen -= tsz;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/3 v3] Add kdump support for the SEV enabled guest

2019-04-30 Thread Lianbo Jiang

Just like the physical machines support kdump, the virtual machines
also need kdump. When a virtual machine panic, we also need to dump
its memory for analysis.

For the SEV virtual machine, the memory is also encrypted. When SEV
is enabled, the second kernel images(kernel and initrd) are loaded
into the encrypted areas. Unlike the SME, the second kernel images
are loaded into the decrypted areas.

Because of this difference between SME and SEV, we need to properly
map the kexec memory area in order to correctly access it.

Test tools:
makedumpfile[v1.6.5]:
git://git.code.sf.net/p/makedumpfile/code
commit  ("Add support for AMD Secure Memory Encryption")
Note: This patch was merged into the devel branch.

crash-7.2.5: https://github.com/crash-utility/crash.git
commit <942d813cda35> ("Fix for the "kmem -i" option on Linux 5.0")

kexec-tools-2.0.19:
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
commit <942d813cda35> ("Fix for the kmem '-i' option on Linux 5.0")
http://lists.infradead.org/pipermail/kexec/2019-March/022576.html
Note: The second kernel cann't boot without this patch.

kernel:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit  ("Merge branch 'akpm' (patches from Andrew)")

Test steps:
[1] load the vmlinux and initrd for kdump
# kexec -p /boot/vmlinuz-5.0.0+ --initrd=/boot/initramfs-5.0.0+kdump.img 
--command-line="BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.0.0+ ro 
resume=UUID=126c5e95-fc8b-48d6-a23b-28409198a52e console=ttyS0,115200 
earlyprintk=serial irqpoll nr_cpus=1 reset_devices cgroup_disable=memory 
mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail 
acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0"

[2] trigger panic
# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger

[3] check and parse the vmcore
# crash vmlinux /var/crash/127.0.0.1-2019-03-15-05\:03\:42/vmcore

Changes since v1:
1. Modify the patch subject prefixes.
2. Improve patch log: add parentheses at the end of the function names.
3. Fix the multiple confusing checks.
4. Add comment in the arch_kexec_post_alloc_pages().

Changes since v2:
1. Add the explanation to the commit message[Boris' suggestion].
2. Improve the patch log.

Lianbo Jiang (3):
  x86/kexec: Do not map the kexec area as decrypted when SEV is active
  x86/kexec: Set the C-bit in the identity map page table when SEV is
active
  kdump,proc/vmcore: Enable dumping encrypted memory when SEV was active

 arch/x86/kernel/machine_kexec_64.c | 27 ++-
 fs/proc/vmcore.c   |  6 +++---
 2 files changed, 29 insertions(+), 4 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/3 v11] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2019-04-22 Thread Lianbo Jiang

At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 17ffc869cab8..1db2754df9e9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/3 v11] x86/mm: change the check condition in SEV because a new descriptor is introduced

2019-04-22 Thread Lianbo Jiang

Originally, those areas described as IORES_DESC_NONE are not mapped
encrypted in SEV when using ioremap(). It checks for a resource that
is not described as IORES_DESC_NONE, which can ensure the reserved
areas are not mapped encrypted when using ioremap().

But now, a new descriptor IORES_DESC_RESERVED has been created for
the reserved areas, similarly, the IORES_DESC_{NONE,RESERVED} should
not be mapped encrypted in SEV when using ioremap().

Therefore, need to modify the check condition in SEV and improve them.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/mm/ioremap.c  | 59 ++
 include/linux/ioport.h |  9 +++
 2 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index dd73d5d74393..82be5707124b 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -27,9 +27,8 @@
 
 #include "physaddr.h"
 
-struct ioremap_mem_flags {
-   bool system_ram;
-   bool desc_other;
+struct ioremap_desc {
+   unsigned int flags;
 };
 
 /*
@@ -61,13 +60,13 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
-static bool __ioremap_check_ram(struct resource *res)
+static unsigned int __ioremap_check_ram(struct resource *res)
 {
unsigned long start_pfn, stop_pfn;
unsigned long i;
 
if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
-   return false;
+   return 0;
 
start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
stop_pfn = (res->end + 1) >> PAGE_SHIFT;
@@ -75,28 +74,44 @@ static bool __ioremap_check_ram(struct resource *res)
for (i = 0; i < (stop_pfn - start_pfn); ++i)
if (pfn_valid(start_pfn + i) &&
!PageReserved(pfn_to_page(start_pfn + i)))
-   return true;
+   return IORES_MAP_SYSTEM_RAM;
}
 
-   return false;
+   return 0;
 }
 
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * NONE and RESERVED should not be mapped encrypted in SEV because there
+ * the whole memory is already encrypted.
+ */
+static unsigned int __ioremap_check_desc(struct resource *res)
 {
-   return (res->desc != IORES_DESC_NONE);
+   if (!sev_active())
+   return 0;
+
+   switch (res->desc) {
+   case IORES_DESC_NONE:
+   case IORES_DESC_RESERVED:
+   break;
+   default:
+   return IORES_MAP_ENCRYPTED;
+   }
+
+   return 0;
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
 {
-   struct ioremap_mem_flags *flags = arg;
+   struct ioremap_desc *desc = arg;
 
-   if (!flags->system_ram)
-   flags->system_ram = __ioremap_check_ram(res);
+   if (!(desc->flags & IORES_MAP_SYSTEM_RAM))
+   desc->flags |= __ioremap_check_ram(res);
 
-   if (!flags->desc_other)
-   flags->desc_other = __ioremap_check_desc_other(res);
+   if (!(desc->flags & IORES_MAP_ENCRYPTED))
+   desc->flags |= __ioremap_check_desc(res);
 
-   return flags->system_ram && flags->desc_other;
+   return ((desc->flags & (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED)) ==
+   (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED));
 }
 
 /*
@@ -105,15 +120,15 @@ static int __ioremap_res_check(struct resource *res, void 
*arg)
  * resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
  */
 static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
-   struct ioremap_mem_flags *flags)
+   struct ioremap_desc *desc)
 {
u64 start, end;
 
start = (u64)addr;
end = start + size - 1;
-   memset(flags, 0, sizeof(*flags));
+   memset(desc, 0, sizeof(struct ioremap_desc));
 
-   walk_mem_res(start, end, flags, __ioremap_res_check);
+   walk_mem_res(start, end, desc, __ioremap_res_check);
 }
 
 /*
@@ -138,7 +153,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
resource_size_t last_addr;
const resource_size_t unaligned_phys_addr = phys_addr;
const unsigned long unaligned_size = size;
-   struct ioremap_mem_flags mem_flags;
+   struct ioremap_desc io_desc;
struct vm_struct *area;
enum page_cache_mode new_pcm;
pgprot_t prot;
@@ -157,12 +172,12 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
return NULL;
}
 
-   __ioremap_check_mem(phys_addr, size, _flags);
+   __ioremap_check_mem(phys_addr, size, _desc);
 
/*
 * Don't allow anybody to remap normal RAM that we're using..
 */
-   if (mem_flags.system_ram) {
+

[PATCH 0/3 v11] add reserved e820 ranges to the kdump kernel e820 table

2019-04-22 Thread Lianbo Jiang

This patchset did three things:

a). x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_
RESERVED'

b). x86/mm: change the check condition in SEV because a new descriptor is
introduced

c). x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()

Changes since v8:
1. Get rid of all changes about ia64.(Borislav's suggestion)
2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
3. Modify the signature. This patch(add the new I/O resource
   descriptor 'IORES_DESC_RESERVED') was suggested by Boris.

Changes since v9:
1. Improve patch log.
2. No need to modify the kernel/resource.c, so correct them.
3. Change the name of the __ioremap_check_desc_other() to
   __ioremap_check_desc_none_and_reserved(), and modify the
   check condition, add comment above it.

Changes since v10:
1. Split them into three patches, the second patch is currently added.
2. Change struct ioremap_mem_flags to struct ioremap_desc and redefine
it.
3. Change the name of the __ioremap_check_desc_other() to
__ioremap_check_desc().
4. Change the check condition in SEV and also improve them.
5. Modify the return value for some functions.

Lianbo Jiang (3):
  x86/e820, resource: add a new I/O resource descriptor
'IORES_DESC_RESERVED'
  x86/mm: change the check condition in SEV because a new descriptor is
introduced
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/x86/kernel/crash.c |  6 +
 arch/x86/kernel/e820.c  |  2 +-
 arch/x86/mm/ioremap.c   | 59 ++---
 include/linux/ioport.h  | 10 +++
 4 files changed, 54 insertions(+), 23 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/3 v11] x86/e820, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED'

2019-04-22 Thread Lianbo Jiang

When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.

But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/e820.c | 2 +-
 include/linux/ioport.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 2879e234e193..16fcde196243 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1050,10 +1050,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 RESEND v10] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2019-03-29 Thread Lianbo Jiang

At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 17ffc869cab8..1db2754df9e9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 RESEND v10] add reserved e820 ranges to the kdump kernel e820 table

2019-03-29 Thread Lianbo Jiang

This patchset did two things:

a). add a new I/O resource descriptor 'IORES_DESC_RESERVED'
When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.

But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
created for the reserved areas, the code originally related to the
descriptor 'IORES_DESC_NONE' also need to be updated.

b). add the e820 reserved ranges to kdump kernel e820 table
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()

Changes since v8:
1. Get rid of all changes about ia64.(Borislav's suggestion)
2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
3. Modify the signature. This patch(add the new I/O resource
   descriptor 'IORES_DESC_RESERVED') was suggested by Boris.

Changes since v9:
1. Improve patch log.
2. No need to modify the kernel/resource.c, so correct them.
3. Change the name of the __ioremap_check_desc_other() to
   __ioremap_check_desc_none_and_reserved(), and modify the
   check condition, add comment above it.

Lianbo Jiang (2):
  x86/mm, resource: add a new I/O resource descriptor
'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/x86/kernel/crash.c |  6 ++
 arch/x86/kernel/e820.c  |  2 +-
 arch/x86/mm/ioremap.c   | 18 +++---
 include/linux/ioport.h  |  1 +
 4 files changed, 23 insertions(+), 4 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v10] x86/mm, resource: add a new I/O resource descriptor 'IORES_DESC_RESERVED'

2019-03-29 Thread Lianbo Jiang

When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.

But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
created for the reserved areas, the code originally related to the
descriptor 'IORES_DESC_NONE' also need to be updated.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/e820.c |  2 +-
 arch/x86/mm/ioremap.c  | 16 ++--
 include/linux/ioport.h |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 2879e234e193..16fcde196243 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1050,10 +1050,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0029604af8a4..5671ec24df49 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -81,9 +81,21 @@ static bool __ioremap_check_ram(struct resource *res)
return false;
 }
 
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * Originally, these areas described as IORES_DESC_NONE are not mapped
+ * as encrypted when using ioremap(), for example, E820_TYPE_{RESERVED,
+ * RESERVED_KERN,RAM,UNUSABLE}, etc. It checks for a resource that is
+ * not described as IORES_DESC_NONE, which can make sure the reserved
+ * areas are not mapped as encrypted when using ioremap().
+ *
+ * Now IORES_DESC_RESERVED has been created for the reserved areas so
+ * the check needs to be expanded so that these areas are not mapped
+ * encrypted when using ioremap().
+ */
+static int __ioremap_check_desc_none_and_reserved(struct resource *res)
 {
-   return (res->desc != IORES_DESC_NONE);
+   return ((res->desc != IORES_DESC_NONE) &&
+   (res->desc != IORES_DESC_RESERVED));
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 v10] add reserved e820 ranges to the kdump kernel e820 table

2019-03-29 Thread Lianbo Jiang

This patchset did two things:

a). add a new I/O resource descriptor 'IORES_DESC_RESERVED'
When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.

But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().

Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
created for the reserved areas, the code originally related to the
descriptor 'IORES_DESC_NONE' also need to be updated.

b). add the e820 reserved ranges to kdump kernel e820 table
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()

Changes since v8:
1. Get rid of all changes about ia64.(Borislav's suggestion)
2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
3. Modify the signature. This patch(add the new I/O resource
   descriptor 'IORES_DESC_RESERVED') was suggested by Boris.

Changes since v9:
1. Improve patch log.
2. No need to modify the kernel/resource.c, so correct them.
3. Change the name of the __ioremap_check_desc_other() to
   __ioremap_check_desc_none_and_reserved(), and modify the
   check condition, add comment above it.

Lianbo Jiang (2):
  x86/mm, resource: add a new I/O resource descriptor
'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/x86/kernel/crash.c |  6 ++
 arch/x86/kernel/e820.c  |  2 +-
 arch/x86/mm/ioremap.c   | 16 ++--
 include/linux/ioport.h  |  1 +
 4 files changed, 22 insertions(+), 3 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 v10] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2019-03-29 Thread Lianbo Jiang

At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:

The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 17ffc869cab8..1db2754df9e9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/3 v2] kdump, proc/vmcore: Enable kdumping encrypted memory when SEV was active

2019-03-26 Thread Lianbo Jiang

In the kdump kernel, the memory of first kernel needs to be dumped
into the vmcore file.

It is similar to the SME, if SEV is enabled in the first kernel, the
old memory has to be remapped with memory encryption mask in order to
access it properly. Following commit 992b649a3f01 ("kdump, proc/vmcore:
Enable kdumping encrypted memory with SME enabled") took care of the
SME case but it uses sme_active() which checks for SME only. Lets use
the mem_encrypt_active() which returns true when either of them are
active.

Unlike the SME, the first kernel is loaded into the encrypted memory
when SEV was enabled, hence the kernel elf header must be remapped as
encrypted in order to access it properly.

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 fs/proc/vmcore.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 3fe90443c1bb..cda6c1922e4f 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -165,7 +165,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
  */
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, false);
+   return read_from_oldmem(buf, count, ppos, 0, sev_active());
 }
 
 /*
@@ -173,7 +173,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 
*ppos)
  */
 ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, sme_active());
+   return read_from_oldmem(buf, count, ppos, 0, mem_encrypt_active());
 }
 
 /*
@@ -373,7 +373,7 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, 
loff_t *fpos,
buflen);
start = m->paddr + *fpos - m->offset;
tmp = read_from_oldmem(buffer, tsz, ,
-  userbuf, sme_active());
+  userbuf, mem_encrypt_active());
if (tmp < 0)
return tmp;
buflen -= tsz;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/3 v2] x86/kexec: Set the C-bit in the identity map page table when SEV is active

2019-03-26 Thread Lianbo Jiang

When SEV is active, the second kernel image is loaded into the
encrypted memory. Lets make sure that when kexec builds the
identity mapping page table it adds the memory encryption mask(C-bit).

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index f60611531d17..11fe352f7344 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -56,6 +56,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
pte_t *pte;
unsigned long vaddr, paddr;
int result = -ENOMEM;
+   pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
 
vaddr = (unsigned long)relocate_kernel;
paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
@@ -92,7 +93,11 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
+
+   if (sev_active())
+   prot = PAGE_KERNEL_EXEC;
+
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
return 0;
 err:
return result;
@@ -129,6 +134,11 @@ static int init_pgtable(struct kimage *image, unsigned 
long start_pgtable)
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
 
+   if (sev_active()) {
+   info.page_flag |= _PAGE_ENC;
+   info.kernpg_flag = _KERNPG_TABLE;
+   }
+
if (direct_gbpages)
info.direct_gbpages = true;
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/3 v2] Add kdump support for the SEV enabled guest

2019-03-26 Thread Lianbo Jiang

Just like the physical machines support kdump, the virtual machines
also need kdump. When a virtual machine panic, we also need to dump
its memory for analysis.

For the SEV virtual machine, the memory is also encrypted. When SEV
is enabled, the first kernel is loaded into the encrypted area.
Unlike the SME, the first kernel is loaded into the decrypted area.

Because of this difference between SME and SEV, we need to properly
map the kexec memory area in order to correctly access it.

Test tools:
makedumpfile[v1.6.5]:
git://git.code.sf.net/p/makedumpfile/code
commit  ("Add support for AMD Secure Memory Encryption")
Note: This patch was merged into the devel branch.

crash-7.2.5: https://github.com/crash-utility/crash.git
commit <942d813cda35> ("Fix for the "kmem -i" option on Linux 5.0")

kexec-tools-2.0.19:
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
commit <942d813cda35> ("Fix for the kmem '-i' option on Linux 5.0")
http://lists.infradead.org/pipermail/kexec/2019-March/022576.html
Note: The second kernel cann't boot without this patch.

kernel:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit  ("Merge branch 'akpm' (patches from Andrew)")

Test steps:
[1] load the vmlinux and initrd for kdump
# kexec -p /boot/vmlinuz-5.0.0+ --initrd=/boot/initramfs-5.0.0+kdump.img 
--command-line="BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.0.0+ ro 
resume=UUID=126c5e95-fc8b-48d6-a23b-28409198a52e console=ttyS0,115200 
earlyprintk=serial irqpoll nr_cpus=1 reset_devices cgroup_disable=memory 
mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail 
acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0"

[2] trigger panic
# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger

[3] check and parse the vmcore
# crash vmlinux /var/crash/127.0.0.1-2019-03-15-05\:03\:42/vmcore

Changes since v1:
1. Modify the patch subject prefixes.
2. Improve patch log: add parentheses at the end of the function names.
3. Fix the multiple confusing checks.
4. Add comment in the arch_kexec_post_alloc_pages().

Lianbo Jiang (3):
  x86/kexec: Do not map the kexec area as decrypted when SEV is active
  x86/kexec: Set the C-bit in the identity map page table when SEV is
active
  kdump,proc/vmcore: Enable kdumping encrypted memory when SEV was
active

 arch/x86/kernel/machine_kexec_64.c | 27 ++-
 fs/proc/vmcore.c   |  6 +++---
 2 files changed, 29 insertions(+), 4 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/3 v2] x86/kexec: Do not map the kexec area as decrypted when SEV is active

2019-03-26 Thread Lianbo Jiang

Currently, the arch_kexec_post_{alloc,free}_pages() unconditionally
maps the kexec area as decrypted. This works fine when SME is active.
Because in SME, the first kernel is loaded in decrypted area by the
BIOS, so the second kernel must be also loaded into the decrypted
memory.

When SEV is active, the first kernel is loaded into the encrypted
area, so the second kernel must be also loaded into the encrypted
memory. Lets make sure that arch_kexec_post_{alloc,free}_pages()
does not clear the memory encryption mask from the kexec area when
SEV is active.

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index ceba408ea982..f60611531d17 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -559,18 +559,33 @@ void arch_kexec_unprotect_crashkres(void)
kexec_mark_crashkres(false);
 }
 
+/*
+ * During a traditional boot under SME, SME will encrypt the kernel,
+ * so the SME kexec kernel also needs to be un-encrypted in order to
+ * replicate a normal SME boot.
+ * During a traditional boot under SEV, the kernel has already been
+ * loaded encrypted, so the SEV kexec kernel needs to be encrypted in
+ * order to replicate a normal SEV boot.
+ */
 int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
 {
+   if (sev_active())
+   return 0;
+
/*
 * If SME is active we need to be sure that kexec pages are
 * not encrypted because when we boot to the new kernel the
 * pages won't be accessed encrypted (initially).
 */
return set_memory_decrypted((unsigned long)vaddr, pages);
+
 }
 
 void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
 {
+   if (sev_active())
+   return;
+
/*
 * If SME is active we need to reset the pages back to being
 * an encrypted mapping before freeing them.
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/3 v9] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2019-03-21 Thread Lianbo Jiang

When doing kexec_file_load, the first kernel needs to pass the e820
reserved ranges to the second kernel. But kernel can not exactly
match the e820 reserved ranges when walking through the iomem resources
with the descriptor 'IORES_DESC_NONE', because several e820 types(
e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
_TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
may pass these four types to the kdump kernel, that is not desired result.

So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
these code originally related to the descriptor 'IORES_DESC_NONE' need to
be updated. Otherwise, it will be easily confused and also cause some
errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
changed.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/e820.c | 2 +-
 include/linux/ioport.h | 1 +
 kernel/resource.c  | 6 +++---
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 2879e234e193..16fcde196243 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1050,10 +1050,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */
diff --git a/kernel/resource.c b/kernel/resource.c
index e81b17b53fa5..ee7348761858 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -990,7 +990,7 @@ __reserve_region_with_split(struct resource *root, 
resource_size_t start,
res->start = start;
res->end = end;
res->flags = type | IORESOURCE_BUSY;
-   res->desc = IORES_DESC_NONE;
+   res->desc = IORES_DESC_RESERVED;
 
while (1) {
 
@@ -1025,7 +1025,7 @@ __reserve_region_with_split(struct resource *root, 
resource_size_t start,
next_res->start = conflict->end + 1;
next_res->end = end;
next_res->flags = type | IORESOURCE_BUSY;
-   next_res->desc = IORES_DESC_NONE;
+   next_res->desc = IORES_DESC_RESERVED;
}
} else {
res->start = conflict->end + 1;
@@ -1488,7 +1488,7 @@ static int __init reserve_setup(char *str)
res->start = io_start;
res->end = io_start + io_num - 1;
res->flags |= IORESOURCE_BUSY;
-   res->desc = IORES_DESC_NONE;
+   res->desc = IORES_DESC_RESERVED;
res->child = NULL;
if (request_resource(parent, res) == 0)
reserved = x+1;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/3 v9] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2019-03-21 Thread Lianbo Jiang

At present, when use the kexec_file_load syscall to load the kernel image
and initramfs(for example: kexec -s -p xxx), kernel does not pass the e820
reserved ranges to the second kernel, which might cause two problems:

The first one is the MMCONFIG issue. The basic problem is that this device
is in PCI segment 1 and the kernel PCI probing can not find it without all
the e820 I/O reservations being present in the e820 table. And the kdump
kernel does not have those reservations because the kexec command does not
pass the I/O reservation via the "memmap=xxx" command line option. (This
problem does not show up for other vendors, as SGI is apparently the
actually fails for everyone, but devices in segment 0 are then found by
some legacy lookup method.) The workaround for this is to pass the I/O
reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 17ffc869cab8..1db2754df9e9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/3 v9] add reserved e820 ranges to the kdump kernel e820 table

2019-03-21 Thread Lianbo Jiang

This patchset did three things:
a). Change the examination condition to avoid confusion

Following the commit <0e4c12b45aa8> ("x86/mm, resource: Use
PAGE_KERNEL protection for ioremap of memory pages"), here
it is really checking for the 'IORES_DESC_ACPI_*' values.
Therefore, it is necessary to change the examination condition
to avoid confusion.

b). add a new I/O resource descriptor 'IORES_DESC_RESERVED'

When doing kexec_file_load, the first kernel needs to pass the e820
reserved ranges to the second kernel. But kernel can not exactly
match the e820 reserved ranges when walking through the iomem resources
with the descriptor 'IORES_DESC_NONE', because several e820 types(
e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
_TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
may pass these four types to the kdump kernel, that is not desired result.

So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
these code originally related to the descriptor 'IORES_DESC_NONE' need to
be updated. Otherwise, it will be easily confused and also cause some
errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
changed.

c). add the e820 reserved ranges to kdump kernel e820 table

At present, when use the kexec_file_load syscall to load the kernel image
and initramfs(for example: kexec -s -p xxx), kernel does not pass the e820
reserved ranges to the second kernel, which might cause two problems:

The first one is the MMCONFIG issue. The basic problem is that this device
is in PCI segment 1 and the kernel PCI probing can not find it without all
the e820 I/O reservations being present in the e820 table. And the kdump
kernel does not have those reservations because the kexec command does not
pass the I/O reservation via the "memmap=xxx" command line option. (This
problem does not show up for other vendors, as SGI is apparently the
actually fails for everyone, but devices in segment 0 are then found by
some legacy lookup method.) The workaround for this is to pass the I/O
reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()

Changes since v8:
1. Get rid of all changes about ia64.(Borislav's suggestion)
2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
3. Modify the signature. This patch(add the new I/O resource
   descriptor 'IORES_DESC_RESERVED') was suggested by Boris.

Lianbo Jiang (3):
  x86/mm: Change the examination condition to avoid confusion
  resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/x86/kernel/crash.c | 6 ++
 arch/x86/kernel/e820.c  | 2 +-
 arch/x86

[PATCH 1/3 v9] x86/mm: Change the examination condition to avoid confusion

2019-03-21 Thread Lianbo Jiang

Following the commit <0e4c12b45aa8> ("x86/mm, resource: Use
PAGE_KERNEL protection for ioremap of memory pages"), here
it is really checking for the 'IORES_DESC_ACPI_*' values.
Therefore, it is necessary to change the examination condition
to avoid confusion.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/mm/ioremap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0029604af8a4..0e3ba620612d 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -83,7 +83,8 @@ static bool __ioremap_check_ram(struct resource *res)
 
 static int __ioremap_check_desc_other(struct resource *res)
 {
-   return (res->desc != IORES_DESC_NONE);
+   return ((res->desc == IORES_DESC_ACPI_TABLES) ||
+   (res->desc == IORES_DESC_ACPI_NV_STORAGE));
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/3] Add kdump support for the SEV enabled guest

2019-03-15 Thread Lianbo Jiang

For the AMD SEV machines, add kdump support when the SEV is enabled.

Test tools:
makedumpfile[v1.6.5]:
git://git.code.sf.net/p/makedumpfile/code
commit  ("Add support for AMD Secure Memory Encryption")
Note: This patch was merged into the devel branch.

crash-7.2.5: https://github.com/crash-utility/crash.git

kexec-tools-2.0.19:
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
commit <942d813cda35> ("Fix for the kmem '-i' option on Linux 5.0")
http://lists.infradead.org/pipermail/kexec/2019-March/022576.html
Note: The second kernel cann't boot without this patch. 

kernel:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit  ("Merge branch 'akpm' (patches from Andrew)")

Test steps:
[1] load the vmlinux and initrd for kdump
# kexec -p /boot/vmlinuz-5.0.0+ --initrd=/boot/initramfs-5.0.0+kdump.img 
--command-line="BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.0.0+ ro 
resume=UUID=126c5e95-fc8b-48d6-a23b-28409198a52e console=ttyS0,115200 
earlyprintk=serial irqpoll nr_cpus=1 reset_devices cgroup_disable=memory 
mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail 
acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0"

[2] trigger panic
# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger

[3] check and parse the vmcore
# crash vmlinux /var/crash/127.0.0.1-2019-03-15-05\:03\:42/vmcore

Lianbo Jiang (3):
  kexec: Do not map the kexec area as decrypted when SEV is active
  kexec: Set the C-bit in the identity map page table when SEV is active
  kdump,proc/vmcore: Enable kdumping encrypted memory when SEV was
active

 arch/x86/kernel/machine_kexec_64.c | 20 +---
 fs/proc/vmcore.c   |  6 +++---
 2 files changed, 20 insertions(+), 6 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 3/3] kdump, proc/vmcore: Enable kdumping encrypted memory when SEV was active

2019-03-15 Thread Lianbo Jiang

In the kdump kernel, the memory of first kernel needs to be dumped
into the vmcore file.

It is similar to the SME, if SEV is enabled in the first kernel, the
old memory has to be remapped with memory encryption mask in order to
access it properly. Following commit 992b649a3f01 ("kdump, proc/vmcore:
Enable kdumping encrypted memory with SME enabled") took care of the
SME case but it uses sme_active() which checks for SME only. Lets use
the mem_encrypt_active() which returns true when either of them are
active.

Unlike the SME, the first kernel is loaded into the encrypted memory
when SEV was enabled, hence the kernel elf header must be remapped as
encrypted in order to access it properly.

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 fs/proc/vmcore.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 3fe90443c1bb..cda6c1922e4f 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -165,7 +165,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
  */
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, false);
+   return read_from_oldmem(buf, count, ppos, 0, sev_active());
 }
 
 /*
@@ -173,7 +173,7 @@ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 
*ppos)
  */
 ssize_t __weak elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, sme_active());
+   return read_from_oldmem(buf, count, ppos, 0, mem_encrypt_active());
 }
 
 /*
@@ -373,7 +373,7 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, 
loff_t *fpos,
buflen);
start = m->paddr + *fpos - m->offset;
tmp = read_from_oldmem(buffer, tsz, ,
-  userbuf, sme_active());
+  userbuf, mem_encrypt_active());
if (tmp < 0)
return tmp;
buflen -= tsz;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/3] kexec: Set the C-bit in the identity map page table when SEV is active

2019-03-15 Thread Lianbo Jiang

When SEV is active, the second kernel image is loaded into the
encrypted memory. Lets make sure that when kexec builds the
identity mapping page table it adds the memory encryption mask(C-bit).

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index bcebf4993da4..8c58d1864500 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -56,6 +56,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
pte_t *pte;
unsigned long vaddr, paddr;
int result = -ENOMEM;
+   pgprot_t prot = PAGE_KERNEL_EXEC_NOENC;
 
vaddr = (unsigned long)relocate_kernel;
paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
@@ -92,7 +93,11 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
}
pte = pte_offset_kernel(pmd, vaddr);
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
+
+   if (sev_active())
+   prot = PAGE_KERNEL_EXEC;
+
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
return 0;
 err:
return result;
@@ -129,6 +134,11 @@ static int init_pgtable(struct kimage *image, unsigned 
long start_pgtable)
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
 
+   if (sev_active()) {
+   info.page_flag |= _PAGE_ENC;
+   info.kernpg_flag = _KERNPG_TABLE;
+   }
+
if (direct_gbpages)
info.direct_gbpages = true;
 
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/3] kexec: Do not map the kexec area as decrypted when SEV is active

2019-03-15 Thread Lianbo Jiang

Currently, the arch_kexec_post_{alloc,free}_pages unconditionally
maps the kexec area as decrypted. This works fine when SME is active.
Because in SME, the first kernel is loaded in decrypted area by the
BIOS, so the second kernel must be also loaded into the decrypted
memory.

When SEV is active, the first kernel is loaded into the encrypted
area, so the second kernel must be also loaded into the encrypted
memory. Lets make sure that arch_kexec_post_{alloc,free}_pages does
not clear the memory encryption mask from the kexec area when SEV
is active.

Co-developed-by: Brijesh Singh 
Signed-off-by: Brijesh Singh 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index ceba408ea982..bcebf4993da4 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -566,7 +566,10 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int 
pages, gfp_t gfp)
 * not encrypted because when we boot to the new kernel the
 * pages won't be accessed encrypted (initially).
 */
-   return set_memory_decrypted((unsigned long)vaddr, pages);
+   if (sme_active())
+   return set_memory_decrypted((unsigned long)vaddr, pages);
+
+   return 0;
 }
 
 void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
@@ -575,5 +578,6 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int 
pages)
 * If SME is active we need to reset the pages back to being
 * an encrypted mapping before freeing them.
 */
-   set_memory_encrypted((unsigned long)vaddr, pages);
+   if (sme_active())
+   set_memory_encrypted((unsigned long)vaddr, pages);
 }
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH] kexec/x86: Unconditionally add the acpi_rsdp command line

2019-03-15 Thread Lianbo Jiang

The Linux kernel commit 3a63f70bf4c3 introduces the early parsing
of the RSDP. This means that boot loader must either set the
boot_params.acpi_rsdp_addr or pass a command line 'acpi_rsdp=xxx'
to tell the RDSP physical address.

Currently, kexec neither sets the boot_params.acpi_rsdp or passes
acpi_rsdp command line if it sees the first kernel support efi
runtime. This is causing the second kernel boot failure.
The EFI runtime is not available so early in the boot process so
unconditionally pass the 'acpi_rsdp=xxx' to the second kernel.

Signed-off-by: Lianbo Jiang 
Signed-off-by: Brijesh Singh 
---
 kexec/arch/i386/crashdump-x86.c | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index 140f45b..a29b15b 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -35,7 +35,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "../../kexec.h"
 #include "../../kexec-elf.h"
 #include "../../kexec-syscall.h"
@@ -772,18 +771,6 @@ static enum coretype get_core_type(struct crash_elf_info 
*elf_info,
}
 }
 
-static int sysfs_efi_runtime_map_exist(void)
-{
-   DIR *dir;
-
-   dir = opendir("/sys/firmware/efi/runtime-map");
-   if (!dir)
-   return 0;
-
-   closedir(dir);
-   return 1;
-}
-
 /* Appends 'acpi_rsdp=' commandline for efi boot crash dump */
 static void cmdline_add_efi(char *cmdline)
 {
@@ -978,9 +965,7 @@ int load_crashdump_segments(struct kexec_info *info, char* 
mod_cmdline,
dbgprintf("Created elf header segment at 0x%lx\n", elfcorehdr);
if (delete_memmap(memmap_p, _memmap, elfcorehdr, memsz) < 0)
return -1;
-   if (!bzImage_support_efi_boot || arch_options.noefi ||
-   !sysfs_efi_runtime_map_exist())
-   cmdline_add_efi(mod_cmdline);
+   cmdline_add_efi(mod_cmdline);
cmdline_add_elfcorehdr(mod_cmdline, elfcorehdr);
 
/* Inform second kernel about the presence of ACPI tables. */
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH v3] Remove the memory encryption mask to obtain the true physical address

2019-01-29 Thread Lianbo Jiang

For AMD machine with SME feature, if SME is enabled in the first
kernel, the crashed kernel's page table(pgd/pud/pmd/pte) contains
the memory encryption mask, so makedumpfile needs to remove the
memory encryption mask to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
Changes since v1:
1. Merge them into a patch.
2. The sme_mask is not an enum number, remove it.
3. Sanity check whether the sme_mask is in vmcoreinfo.
4. Deal with the huge pages case.
5. Cover the 5-level path.

Changes since v2:
1. Change the sme_me_mask to entry_mask.
2. No need to remove the mask when makedumpfile prints out the
   value of the entry.
3. Remove the sme mask from the pte at the end of the __vtop4_x86_64().
4. Also need to remove the sme mask from page table entry in
   find_vmemmap_x86_64()

 arch/x86_64.c  | 30 +++---
 makedumpfile.c |  4 
 makedumpfile.h |  1 +
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/arch/x86_64.c b/arch/x86_64.c
index 537fb78..9977466 100644
--- a/arch/x86_64.c
+++ b/arch/x86_64.c
@@ -291,6 +291,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
unsigned long page_dir, pgd, pud_paddr, pud_pte, pmd_paddr, pmd_pte;
unsigned long pte_paddr, pte;
unsigned long p4d_paddr, p4d_pte;
+   unsigned long entry_mask = ENTRY_MASK;
 
/*
 * Get PGD.
@@ -302,6 +303,9 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
return NOT_PADDR;
}
 
+   if (NUMBER(sme_mask) != NOT_FOUND_NUMBER)
+   entry_mask &= ~(NUMBER(sme_mask));
+
if (check_5level_paging()) {
page_dir += pgd5_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, page_dir, , sizeof pgd)) {
@@ -318,7 +322,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
/*
 * Get P4D.
 */
-   p4d_paddr  = pgd & ENTRY_MASK;
+   p4d_paddr  = pgd & entry_mask;
p4d_paddr += p4d_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, p4d_paddr, _pte, sizeof p4d_pte)) {
ERRMSG("Can't get p4d_pte (p4d_paddr:%lx).\n", 
p4d_paddr);
@@ -331,7 +335,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
ERRMSG("Can't get a valid p4d_pte.\n");
return NOT_PADDR;
}
-   pud_paddr  = p4d_pte & ENTRY_MASK;
+   pud_paddr  = p4d_pte & entry_mask;
}else {
page_dir += pgd_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, page_dir, , sizeof pgd)) {
@@ -345,7 +349,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
ERRMSG("Can't get a valid pgd.\n");
return NOT_PADDR;
}
-   pud_paddr  = pgd & ENTRY_MASK;
+   pud_paddr  = pgd & entry_mask;
}
 
/*
@@ -364,13 +368,13 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long 
pagetable)
return NOT_PADDR;
}
if (pud_pte & _PAGE_PSE)/* 1GB pages */
-   return (pud_pte & ENTRY_MASK & PUD_MASK) +
+   return (pud_pte & entry_mask & PUD_MASK) +
(vaddr & ~PUD_MASK);
 
/*
 * Get PMD.
 */
-   pmd_paddr  = pud_pte & ENTRY_MASK;
+   pmd_paddr  = pud_pte & entry_mask;
pmd_paddr += pmd_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, pmd_paddr, _pte, sizeof pmd_pte)) {
ERRMSG("Can't get pmd_pte (pmd_paddr:%lx).\n", pmd_paddr);
@@ -384,13 +388,13 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long 
pagetable)
return NOT_PADDR;
}
if (pmd_pte & _PAGE_PSE)/* 2MB pages */
-   return (pmd_pte & ENTRY_MASK & PMD_MASK) +
+   return (pmd_pte & entry_mask & PMD_MASK) +
(vaddr & ~PMD_MASK);
 
/*
 * Get PTE.
 */
-   pte_paddr  = pmd_pte & ENTRY_MASK;
+   pte_paddr  = pmd_pte & entry_mask;
pte_paddr += pte_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, pte_paddr, , sizeof pte)) {
ERRMSG("Can't get pte (pte_paddr:%lx).\n", pte_paddr);
@@ -403,7 +407,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
ERRMSG("Can't get a valid pte.\n");
return NOT_PADDR;
}
-   return (pte & ENTRY_MASK) + PAGEOFFSET(vaddr);
+   return (pte & entry_mask) + PAGEOFFSET(vaddr);
 }
 
 unsigned long long
@@ -636,6 +640,7 @@ find_vmemmap_x86_64()
unsigned long pmd, tpfn;
unsigned long pvaddr = 0;
unsigned lo

[PATCH v2] Remove the memory encryption mask to obtain the true physical address

2019-01-27 Thread Lianbo Jiang

For AMD machine with SME feature, if SME is enabled in the first
kernel, the crashed kernel's page table(pgd/pud/pmd/pte) contains
the memory encryption mask, so makedumpfile needs to remove the
memory encryption mask to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
Changes since v1:
1. Merge them into a patch.
2. The sme_mask is not an enum number, remove it.
3. Sanity check whether the sme_mask is in vmcoreinfo.
4. Deal with the huge pages case.
5. Cover the 5-level path.

 arch/x86_64.c  | 30 +-
 makedumpfile.c |  4 
 makedumpfile.h |  1 +
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/x86_64.c b/arch/x86_64.c
index 537fb78..7b3ed10 100644
--- a/arch/x86_64.c
+++ b/arch/x86_64.c
@@ -291,6 +291,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
unsigned long page_dir, pgd, pud_paddr, pud_pte, pmd_paddr, pmd_pte;
unsigned long pte_paddr, pte;
unsigned long p4d_paddr, p4d_pte;
+   unsigned long sme_me_mask = ~0UL;
 
/*
 * Get PGD.
@@ -302,6 +303,9 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
return NOT_PADDR;
}
 
+   if (NUMBER(sme_mask) != NOT_FOUND_NUMBER)
+   sme_me_mask = ~(NUMBER(sme_mask));
+
if (check_5level_paging()) {
page_dir += pgd5_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, page_dir, , sizeof pgd)) {
@@ -309,7 +313,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
return NOT_PADDR;
}
if (info->vaddr_for_vtop == vaddr)
-   MSG("  PGD : %16lx => %16lx\n", page_dir, pgd);
+   MSG("  PGD : %16lx => %16lx\n", page_dir, (pgd & 
sme_me_mask));
 
if (!(pgd & _PAGE_PRESENT)) {
ERRMSG("Can't get a valid pgd.\n");
@@ -318,20 +322,20 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long 
pagetable)
/*
 * Get P4D.
 */
-   p4d_paddr  = pgd & ENTRY_MASK;
+   p4d_paddr  = pgd & ENTRY_MASK & sme_me_mask;
p4d_paddr += p4d_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, p4d_paddr, _pte, sizeof p4d_pte)) {
ERRMSG("Can't get p4d_pte (p4d_paddr:%lx).\n", 
p4d_paddr);
return NOT_PADDR;
}
if (info->vaddr_for_vtop == vaddr)
-   MSG("  P4D : %16lx => %16lx\n", p4d_paddr, p4d_pte);
+   MSG("  P4D : %16lx => %16lx\n", p4d_paddr, (p4d_pte & 
sme_me_mask));
 
if (!(p4d_pte & _PAGE_PRESENT)) {
ERRMSG("Can't get a valid p4d_pte.\n");
return NOT_PADDR;
}
-   pud_paddr  = p4d_pte & ENTRY_MASK;
+   pud_paddr  = p4d_pte & ENTRY_MASK & sme_me_mask;
}else {
page_dir += pgd_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, page_dir, , sizeof pgd)) {
@@ -339,13 +343,13 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long 
pagetable)
return NOT_PADDR;
}
if (info->vaddr_for_vtop == vaddr)
-   MSG("  PGD : %16lx => %16lx\n", page_dir, pgd);
+   MSG("  PGD : %16lx => %16lx\n", page_dir, (pgd & 
sme_me_mask));
 
if (!(pgd & _PAGE_PRESENT)) {
ERRMSG("Can't get a valid pgd.\n");
return NOT_PADDR;
}
-   pud_paddr  = pgd & ENTRY_MASK;
+   pud_paddr  = pgd & ENTRY_MASK & sme_me_mask;
}
 
/*
@@ -357,47 +361,47 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long 
pagetable)
return NOT_PADDR;
}
if (info->vaddr_for_vtop == vaddr)
-   MSG("  PUD : %16lx => %16lx\n", pud_paddr, pud_pte);
+   MSG("  PUD : %16lx => %16lx\n", pud_paddr, (pud_pte & 
sme_me_mask));
 
if (!(pud_pte & _PAGE_PRESENT)) {
ERRMSG("Can't get a valid pud_pte.\n");
return NOT_PADDR;
}
if (pud_pte & _PAGE_PSE)/* 1GB pages */
-   return (pud_pte & ENTRY_MASK & PUD_MASK) +
+   return (pud_pte & ENTRY_MASK & PUD_MASK & sme_me_mask) +
(vaddr & ~PUD_MASK);
 
/*
 * Get PMD.
 */
-   pmd_paddr  = pud_pte & ENTRY_MASK;
+   pmd_paddr  = pud_pte & ENTRY_MASK & sme_me_mask;
pmd_paddr += pmd_index(vaddr) * sizeof(unsigned long

[PATCH 2/2] Remove the memory encryption mask to obtain the true physical address

2019-01-22 Thread Lianbo Jiang

For AMD machine with SME feature, if SME is enabled in the first
kernel, the crashed kernel's page table(pgd/pud/pmd/pte) contains
the memory encryption mask, so makedumpfile needs to remove the
memory encryption mask to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
 arch/x86_64.c  | 3 +++
 makedumpfile.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/x86_64.c b/arch/x86_64.c
index 537fb78..7651d36 100644
--- a/arch/x86_64.c
+++ b/arch/x86_64.c
@@ -346,6 +346,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
return NOT_PADDR;
}
pud_paddr  = pgd & ENTRY_MASK;
+   pud_paddr = pud_paddr & ~(NUMBER(sme_mask));
}
 
/*
@@ -371,6 +372,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
 * Get PMD.
 */
pmd_paddr  = pud_pte & ENTRY_MASK;
+   pmd_paddr = pmd_paddr & ~(NUMBER(sme_mask));
pmd_paddr += pmd_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, pmd_paddr, _pte, sizeof pmd_pte)) {
ERRMSG("Can't get pmd_pte (pmd_paddr:%lx).\n", pmd_paddr);
@@ -391,6 +393,7 @@ __vtop4_x86_64(unsigned long vaddr, unsigned long pagetable)
 * Get PTE.
 */
pte_paddr  = pmd_pte & ENTRY_MASK;
+   pte_paddr = pte_paddr & ~(NUMBER(sme_mask));
pte_paddr += pte_index(vaddr) * sizeof(unsigned long);
if (!readmem(PADDR, pte_paddr, , sizeof pte)) {
ERRMSG("Can't get pte (pte_paddr:%lx).\n", pte_paddr);
diff --git a/makedumpfile.c b/makedumpfile.c
index a03aaa1..81c7bb4 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -977,6 +977,7 @@ next_page:
read_size = MIN(info->page_size - PAGEOFFSET(paddr), size);
 
pgaddr = PAGEBASE(paddr);
+   pgaddr = pgaddr & ~(NUMBER(sme_mask));
pgbuf = cache_search(pgaddr, read_size);
if (!pgbuf) {
++cache_miss;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2] Makedumpfile: add a new variable 'sme_mask' to number_table

2019-01-22 Thread Lianbo Jiang

It will be used to store the sme mask for crashed kernel, the
sme_mask denotes whether the old memory is encrypted or not.

Signed-off-by: Lianbo Jiang 
---
 makedumpfile.c | 3 +++
 makedumpfile.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/makedumpfile.c b/makedumpfile.c
index 8923538..a03aaa1 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -1743,6 +1743,7 @@ get_structure_info(void)
ENUM_NUMBER_INIT(NR_FREE_PAGES, "NR_FREE_PAGES");
ENUM_NUMBER_INIT(N_ONLINE, "N_ONLINE");
ENUM_NUMBER_INIT(pgtable_l5_enabled, "pgtable_l5_enabled");
+   ENUM_NUMBER_INIT(sme_mask, "sme_mask");
 
ENUM_NUMBER_INIT(PG_lru, "PG_lru");
ENUM_NUMBER_INIT(PG_private, "PG_private");
@@ -2276,6 +2277,7 @@ write_vmcoreinfo_data(void)
WRITE_NUMBER("NR_FREE_PAGES", NR_FREE_PAGES);
WRITE_NUMBER("N_ONLINE", N_ONLINE);
WRITE_NUMBER("pgtable_l5_enabled", pgtable_l5_enabled);
+   WRITE_NUMBER("sme_mask", sme_mask);
 
WRITE_NUMBER("PG_lru", PG_lru);
WRITE_NUMBER("PG_private", PG_private);
@@ -2672,6 +2674,7 @@ read_vmcoreinfo(void)
READ_NUMBER("NR_FREE_PAGES", NR_FREE_PAGES);
READ_NUMBER("N_ONLINE", N_ONLINE);
READ_NUMBER("pgtable_l5_enabled", pgtable_l5_enabled);
+   READ_NUMBER("sme_mask", sme_mask);
 
READ_NUMBER("PG_lru", PG_lru);
READ_NUMBER("PG_private", PG_private);
diff --git a/makedumpfile.h b/makedumpfile.h
index 73813ed..e97b2e7 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -1912,6 +1912,7 @@ struct number_table {
longNR_FREE_PAGES;
longN_ONLINE;
longpgtable_l5_enabled;
+   longsme_mask;
 
/*
* Page flags
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2] makedumpfile needs to remove the memory encryption

2019-01-22 Thread Lianbo Jiang

The patchset did two things:
[1] add a new variable 'sme_mask' to number_table

The variable will be used to store the sme mask for crashed kernel,
the sme_mask denotes whether the old memory is encrypted or not.

[2] remove the memory encryption mask to obtain the true physical
address

For AMD machine with SME feature, if SME is enabled in the first
kernel, the crashed kernel's page table(pgd/pud/pmd/pte) contains
the memory encryption mask, so makedumpfile needs to remove the
memory encryption mask to obtain the true physical address.

References:

x86/kdump: Export the SME mask to vmcoreinfo
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=65f750e5457aef9a8085a99d613fea0430303e93
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=f263245a0ce2c4e23b89a58fa5f7dfc048e11929

Lianbo Jiang (2):
  Makedumpfile: add a new variable 'sme_mask' to number_table
  Remove the memory encryption mask to obtain the true physical address

 arch/x86_64.c  | 3 +++
 makedumpfile.c | 4 
 makedumpfile.h | 1 +
 3 files changed, 8 insertions(+)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-10 Thread Lianbo Jiang

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 Documentation/kdump/vmcoreinfo.txt | 500 +
 1 file changed, 500 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index ..8e444586b87b
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,500 @@
+
+   VMCOREINFO
+
+
+===
+What is the VMCOREINFO?
+===
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+
+Common variables
+
+
+init_uts_ns.name.release
+
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built.
+
+PAGE_SIZE
+-
+
+The size of a page. It is the smallest unit of data for memory
+management in kernel. It is usually 4096 bytes and a page is aligned
+on 4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+---
+
+This is the UTS namespace, which is used to isolate two specific
+elements of the system that relate to the uname(2) system call. The UTS
+namespace is named after the data structure used to store information
+returned by the uname(2) system call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---
+
+An array node_states[N_ONLINE] which represents the set of online node
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+-
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+--
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--
+
+Stores the virtual area list. makedumpfile can get the vmalloc start
+value from this variable. This value is necessary for vmalloc translation.
+
+mem_map
+---
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+
+
+Makedumpfile can get the pglist_data structure from this symbol, which
+is used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them are used to translate an
+address.
+
+page
+
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute the contiguous memory.
+
+pglist_data
+---
+
+The size of a pglist_data structure. This value will be used to check
+if the pglist_data structure is valid. It is also used for checking the
+memory type.
+
+zone
+
+
+The size of a zone structure. This value is often used to check if the
+zone structure has been found. It is also used for excluding free pages.
+
+free_area
+-
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful for excluding free pages.
+
+list_head
+-
+
+The size of a list_head structure. Used when iterating lists in a
+post-mortem analysis session.
+
+nodemask_t
+--
+
+The size of a nodemask_t type. Used to compute the number of online
+nodes.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
+   compound_order|compound_head)
+---
+
+User-space tools can compute their values based on the offset of these
+variables. The variables are helpful to exclude unnecessary pages.
+
+(pglist_data, node_zones|nr_zones|node_mem_map

[PATCH 0/2 v6] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2019-01-10 Thread Lianbo Jiang

This patchset did two things:
a. add a new document for vmcoreinfo

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo.

b. export the value of sme mask to vmcoreinfo

For AMD machine with SME feature, makedumpfile tools need to know whether
the crashed kernel was encrypted or not. If SME is enabled in the first
kernel, the crashed kernel's page table(pgd/pud/pmd/pte) contains the
memory encryption mask, so makedumpfile needs to remove the sme mask to
obtain the true physical address.

Changes since v1:
1. No need to export a kernel-internal mask to userspace, so copy the
value of sme_me_mask to a local variable 'sme_mask' and write the value
of sme_mask to vmcoreinfo.
2. Add comment for the code.
3. Improve the patch log.
4. Add the vmcoreinfo documentation.

Changes since v2:
1. Improve the vmcoreinfo document, add more descripts for these
variables exported.
2. Fix spelling errors in the document.

Changes since v3:
1. Still improve the vmcoreinfo document, and make it become more
clear and easy to read.
2. Move sme_mask comments in the code to the vmcoreinfo document.
3. Improve patch log.

Changes since v4:
1. Remove a command that dumping the VMCOREINFO contents from this
   document.
2. Merge the 'PG_buddy' and 'PG_offline' into the PG_* flag in this
   document.
3. Correct some of the mistakes in this document.

Changes since v5:
1. Improve patch log.

Lianbo Jiang (2):
  kdump: add the vmcoreinfo documentation
  kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

 Documentation/kdump/vmcoreinfo.txt | 500 +
 arch/x86/kernel/machine_kexec_64.c |   3 +
 2 files changed, 503 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 v6] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2019-01-10 Thread Lianbo Jiang

For AMD machine with SME feature, makedumpfile tools need to know
whether the crashed kernel was encrypted or not. If SME is enabled
in the first kernel, the crashed kernel's page table(pgd/pud/pmd/pte)
contains the memory encryption mask, so makedumpfile needs to remove
the sme mask to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 4c8acdfdc5a7..bc4108096b18 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -352,10 +352,13 @@ void machine_kexec(struct kimage *image)
 
 void arch_crash_save_vmcoreinfo(void)
 {
+   u64 sme_mask = sme_me_mask;
+
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
pgtable_l5_enabled());
+   VMCOREINFO_NUMBER(sme_mask);
 
 #ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 v5] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2019-01-06 Thread Lianbo Jiang

For AMD machine with SME feature, makedumpfile tools need to know
whether the crash kernel was encrypted or not. If SME is enabled
in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
contains the memory encryption mask, so need to remove the sme mask
to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 4c8acdfdc5a7..bc4108096b18 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -352,10 +352,13 @@ void machine_kexec(struct kimage *image)
 
 void arch_crash_save_vmcoreinfo(void)
 {
+   u64 sme_mask = sme_me_mask;
+
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
pgtable_l5_enabled());
+   VMCOREINFO_NUMBER(sme_mask);
 
 #ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 v5] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2019-01-06 Thread Lianbo Jiang

This patchset did two things:
a. add a new document for vmcoreinfo

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it also normalizes the
exported variable as a convention between kernel and use-space.

b. export the value of sme mask to vmcoreinfo

For AMD machine with SME feature, makedumpfile tools need to know whether
the crash kernel was encrypted or not. If SME is enabled in the first
kernel, the crash kernel's page table(pgd/pud/pmd/pte) contains the
memory encryption mask, so need to remove the sme mask to obtain the true
physical address.

Changes since v1:
1. No need to export a kernel-internal mask to userspace, so copy the
value of sme_me_mask to a local variable 'sme_mask' and write the value
of sme_mask to vmcoreinfo.
2. Add comment for the code.
3. Improve the patch log.
4. Add the vmcoreinfo documentation.

Changes since v2:
1. Improve the vmcoreinfo document, add more descripts for these
variables exported.
2. Fix spelling errors in the document.

Changes since v3:
1. Still improve the vmcoreinfo document, and make it become more
clear and easy to read.
2. Move sme_mask comments in the code to the vmcoreinfo document.
3. Improve patch log.

Changes since v4:
1. Remove a command that dumping the VMCOREINFO contents from this
   document.
2. Merge the 'PG_buddy' and 'PG_offline' into the PG_* flag in this
   document.
3. Correct some of the mistakes in this document.

*** BLURB HERE ***

Lianbo Jiang (2):
  kdump: add the vmcoreinfo documentation
  kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

 Documentation/kdump/vmcoreinfo.txt | 500 +
 arch/x86/kernel/machine_kexec_64.c |   3 +
 2 files changed, 503 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v5] kdump: add the vmcoreinfo documentation

2019-01-06 Thread Lianbo Jiang

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it also normalizes the
exported variables as a convention between kernel and use-space.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 Documentation/kdump/vmcoreinfo.txt | 500 +
 1 file changed, 500 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index ..8e444586b87b
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,500 @@
+
+   VMCOREINFO
+
+
+===
+What is the VMCOREINFO?
+===
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+
+Common variables
+
+
+init_uts_ns.name.release
+
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built.
+
+PAGE_SIZE
+-
+
+The size of a page. It is the smallest unit of data for memory
+management in kernel. It is usually 4096 bytes and a page is aligned
+on 4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+---
+
+This is the UTS namespace, which is used to isolate two specific
+elements of the system that relate to the uname(2) system call. The UTS
+namespace is named after the data structure used to store information
+returned by the uname(2) system call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---
+
+An array node_states[N_ONLINE] which represents the set of online node
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+-
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+--
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--
+
+Stores the virtual area list. makedumpfile can get the vmalloc start
+value from this variable. This value is necessary for vmalloc translation.
+
+mem_map
+---
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+
+
+Makedumpfile can get the pglist_data structure from this symbol, which
+is used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them are used to translate an
+address.
+
+page
+
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute the contiguous memory.
+
+pglist_data
+---
+
+The size of a pglist_data structure. This value will be used to check
+if the pglist_data structure is valid. It is also used for checking the
+memory type.
+
+zone
+
+
+The size of a zone structure. This value is often used to check if the
+zone structure has been found. It is also used for excluding free pages.
+
+free_area
+-
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful for excluding free pages.
+
+list_head
+-
+
+The size of a list_head structure. Used when iterating lists in a
+post-mortem analysis session.
+
+nodemask_t
+--
+
+The size of a nodemask_t type. Used to compute the number of online
+nodes.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
+   compound_order|compound_head)
+---
+
+User-space tools can compute their values based on the offset of these
+variables. The variables are helpful

[PATCH 1/2 v4] kdump: add the vmcoreinfo documentation

2018-12-19 Thread Lianbo Jiang

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it also normalizes the
exported variables as a convention between kernel and use-space.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 Documentation/kdump/vmcoreinfo.txt | 513 +
 1 file changed, 513 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index ..1f1f69143600
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,513 @@
+
+   VMCOREINFO
+
+
+===
+What is the VMCOREINFO?
+===
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+To dump the VMCOREINFO contents, one can do:
+
+# makedumpfile -g VMCOREINFO -x vmlinux
+
+
+Common variables
+
+
+init_uts_ns.name.release
+
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built.
+
+PAGE_SIZE
+-
+
+The size of a page. It is the smallest unit of data for memory
+management in kernel. It is usually 4096 bytes and a page is aligned on
+4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+---
+
+This is the UTS namespace, which is used to isolate two specific
+elements of the system that relate to the uname(2) system call. The UTS
+namespace is named after the data structure used to store information
+returned by the uname(2) system call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---
+
+An array node_states[N_ONLINE] which represents the set of online node
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+-
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+--
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--
+
+Stores the virtual area list. makedumpfile can get the vmalloc start
+value from this variable. This value is necessary for vmalloc translation.
+
+mem_map
+---
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+
+
+Makedumpfile can get the pglist_data structure from this symbol, which
+is used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them are used to translate an
+address.
+
+page
+
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute the contiguous memory.
+
+pglist_data
+---
+
+The size of a pglist_data structure. This value will be used to check
+if the pglist_data structure is valid. It is also used for checking the
+memory type.
+
+zone
+
+
+The size of a zone structure. This value is often used to check if the
+zone structure has been found. It is also used for excluding free pages.
+
+free_area
+-
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful for excluding free pages.
+
+list_head
+-
+
+The size of a list_head structure. Used when iterating lists in a
+post-mortem analysis session.
+
+nodemask_t
+--
+
+The size of a nodemask_t type. Used to compute the number of online
+nodes.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
+   compound_order|compound_head)
+---
+
+User-space tools

[PATCH 0/2 v4] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-19 Thread Lianbo Jiang

This patchset did two things:
a. add a new document for vmcoreinfo

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it also normalizes the
exported variable as a convention between kernel and use-space.

b. export the value of sme mask to vmcoreinfo

For AMD machine with SME feature, makedumpfile tools need to know whether
the crash kernel was encrypted or not. If SME is enabled in the first
kernel, the crash kernel's page table(pgd/pud/pmd/pte) contains the
memory encryption mask, so need to remove the sme mask to obtain the true
physical address.

Changes since v1:
1. No need to export a kernel-internal mask to userspace, so copy the
value of sme_me_mask to a local variable 'sme_mask' and write the value
of sme_mask to vmcoreinfo.
2. Add comment for the code.
3. Improve the patch log.
4. Add the vmcoreinfo documentation.

Changes since v2:
1. Improve the vmcoreinfo document, add more descripts for these
variables exported.
2. Fix spelling errors in the document.

Changes since v3:
1. Still improve the vmcoreinfo document, and make it become more
clear and easy to read.
2. Move sme_mask comments in the code to the vmcoreinfo document.
3. Improve patch log.

Lianbo Jiang (2):
  kdump: add the vmcoreinfo documentation
  kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

 Documentation/kdump/vmcoreinfo.txt | 513 +
 arch/x86/kernel/machine_kexec_64.c |   3 +
 2 files changed, 516 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 v4] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-19 Thread Lianbo Jiang

For AMD machine with SME feature, makedumpfile tools need to know
whether the crash kernel was encrypted or not. If SME is enabled
in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
contains the memory encryption mask, so need to remove the sme mask
to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 4c8acdfdc5a7..bc4108096b18 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -352,10 +352,13 @@ void machine_kexec(struct kimage *image)
 
 void arch_crash_save_vmcoreinfo(void)
 {
+   u64 sme_mask = sme_me_mask;
+
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
pgtable_l5_enabled());
+   VMCOREINFO_NUMBER(sme_mask);
 
 #ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v3] kdump: add the vmcoreinfo documentation

2018-12-16 Thread Lianbo Jiang

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it would normalize the
exported variable as a standard ABI between kernel and use-space.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 Documentation/kdump/vmcoreinfo.txt | 456 +
 1 file changed, 456 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index ..d71260bf383a
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,456 @@
+
+   Documentation for VMCOREINFO
+
+
+===
+What is the VMCOREINFO?
+===
+It is a special ELF note section. The VMCOREINFO contains the first
+kernel's various information, for example, structure size, page size,
+symbol values and field offset, etc. These data are packed into an ELF
+note section, and these data will also help user-space tools(e.g. crash
+makedumpfile) analyze the first kernel's memory usage.
+
+In general, makedumpfile can dump the VMCOREINFO contents from vmlinux
+in the first kernel. For example:
+# makedumpfile -g VMCOREINFO -x vmlinux
+
+
+Common variables
+
+
+init_uts_ns.name.release
+
+The number of OS release. Based on this version number, people can find
+the source code for the corresponding version. When analyzing the vmcore,
+people must read the source code to find the reason why the kernel crashed.
+
+PAGE_SIZE
+=
+The size of a page. It is the smallest unit of data for memory management
+in kernel. It is usually 4k bytes and the page is aligned in 4k bytes,
+which is very important for computing address.
+
+init_uts_ns
+===
+This is the UTS namespace, which is used to isolate two specific elements
+of the system that relate to the uname system call. The UTS namespace is
+named after the data structure used to store information returned by the
+uname system call.
+
+User-space tools can get the kernel name, host name, kernel release number,
+kernel version, architecture name and OS type from the 'init_uts_ns'.
+
+node_online_map
+===
+It is a macro definition, actually it is an array node_states[N_ONLINE],
+and it represents the set of online node in a system, one bit position
+per node number.
+
+This is used to keep track of which nodes are in the system and online.
+
+swapper_pg_dir
+=
+It generally indicates the pgd for the kernel. When mmu is enabled in
+config file, the 'swapper_pg_dir' is valid.
+
+The 'swapper_pg_dir' helps to translate the virtual address to a physical
+address.
+
+_stext
+==
+It is an assemble symbol that defines the beginning of the text section.
+In general, the '_stext' indicates the kernel start address. This is used
+to convert a virtual address to a physical address when the virtual address
+does not belong to the 'vmalloc' address.
+
+vmap_area_list
+==
+It stores the virtual area list, makedumpfile can get the vmalloc start
+value from this variable. This value is necessary for vmalloc translation.
+
+mem_map
+===
+Physical addresses are translated to struct pages by treating them as an
+index into the mem_map array. Shifting a physical address PAGE_SHIFT bits
+to the right will treat it as a PFN from physical address 0, which is also
+an index within the mem_map array.
+
+In short, it can map the address to struct page.
+
+contig_page_data
+
+Makedumpfile can get the pglist_data structure from this symbol
+'contig_page_data'. The pglist_data structure is used to describe the
+memory layout.
+
+User-space tools can use this symbols for excluding free pages.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+==
+Export the address of 'mem_section' array, and it's length, structure size,
+and the 'section_mem_map' offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them will help to translate
+the address.
+
+page
+
+The size of a 'page' structure. In kernel, the page is an important data
+structure, it is widely used to compute the continuous memory.
+
+pglist_data
+===
+The size of a 'pglist_data' structure. This value will be used to check if
+the 'pglist_data' structure is valid. It is also one of the conditions for
+checking the memory type.
+
+zone
+
+The size of a 'zone' structure. This value is often used to check if the
+'zone' structure is found. It is necessary structures for excluding free
+pages.
+
+free_area

[PATCH 2/2 v3] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-16 Thread Lianbo Jiang

For AMD machine with SME feature, makedumpfile tools need to know
whether the crash kernel was encrypted or not. If SME is enabled
in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
contains the memory encryption mask, so need to remove the sme mask
to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 4c8acdfdc5a7..1860fe24117d 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -352,10 +352,24 @@ void machine_kexec(struct kimage *image)
 
 void arch_crash_save_vmcoreinfo(void)
 {
+   u64 sme_mask = sme_me_mask;
+
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
pgtable_l5_enabled());
+   /*
+* Currently, the local variable 'sme_mask' stores the value of
+* sme_me_mask(bit 47), and also write the value of sme_mask to
+* the vmcoreinfo.
+* If need, the bit(sme_mask) might be redefined in the future,
+* but the 'bit63' will be reserved.
+* For example:
+* [ misc  ][ enc bit  ][ other misc SME info   ]
+* ____1000______..._
+* 63   59   55   51   47   43   39   35   31   27   ... 3
+*/
+   VMCOREINFO_NUMBER(sme_mask);
 
 #ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 v3] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-16 Thread Lianbo Jiang

This patchset did two things:
a. add a new document for vmcoreinfo

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it would normalize the
exported variable as a standard ABI between kernel and use-space.

b. export the value of sme mask to vmcoreinfo

For AMD machine with SME feature, makedumpfile tools need to know whether
the crash kernel was encrypted or not. If SME is enabled in the first
kernel, the crash kernel's page table(pgd/pud/pmd/pte) contains the
memory encryption mask, so need to remove the sme mask to obtain the true
physical address.

Changes since v1:
1. No need to export a kernel-internal mask to userspace, so copy the
value of sme_me_mask to a local variable 'sme_mask' and write the value
of sme_mask to vmcoreinfo.
2. Add comment for the code.
3. Improve the patch log.
4. Add the vmcoreinfo documentation.

Changes since v2:
1. Improve the vmcoreinfo document, add more descripts for these
variables exported.
2. Fix spelling errors in the document.

Lianbo Jiang (2):
  kdump: add the vmcoreinfo documentation
  kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

 Documentation/kdump/vmcoreinfo.txt | 456 +
 arch/x86/kernel/machine_kexec_64.c |  14 +
 2 files changed, 470 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v2] kdump: add the vmcoreinfo documentation

2018-12-01 Thread Lianbo Jiang

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it also normalizes the
exported variable as a standard ABI between kernel and use-space.

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
---
 Documentation/kdump/vmcoreinfo.txt | 400 +
 1 file changed, 400 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index ..c6759be14af7
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,400 @@
+
+   Documentation for Vmcoreinfo
+
+
+===
+What is the vmcoreinfo?
+===
+The vmcoreinfo contains the first kernel's various information, for
+example, structure size, page size, symbol values and field offset,
+etc. These data are encapsulated into an elf format, and these data
+will also help user-space tools(e.g. makedumpfile, crash) analyze the
+first kernel's memory usage.
+
+
+Common variables
+
+
+init_uts_ns.name.release
+
+The number of OS release.
+
+PAGE_SIZE
+=
+The size of a page. It is usually 4k bytes.
+
+init_uts_ns
+===
+This is the UTS namespace, which is used to isolate two specific elements
+of the system that relate to the uname system call. The UTS namespace is
+named after the data structure used to store information returned by the
+uname system call.
+
+node_online_map
+===
+It is a macro definition, actually it is an arrary node_states[N_ONLINE],
+and it represents the set of online node in a system, one bit position
+per node number.
+
+swapper_pg_dir
+=
+It is always an array, it gerenally stands for the pgd for the kernel.
+When mmu is enabled in config file, the 'swapper_pg_dir' is valid.
+
+_stext
+==
+It is an assemble directive that defines the beginning of the text section.
+In gerenal, the '_stext' indicates the kernel start address.
+
+vmap_area_list
+==
+It stores the virtual area list, makedumpfile can get the vmalloc start
+value according to this variable.
+
+mem_map
+===
+Physical addresses are translated to struct pages by treating them as an
+index into the mem_map array. Shifting a physical address PAGE_SHIFT bits
+to the right will treat it as a PFN from physical address 0, which is also
+an index within the mem_map array.
+
+In a word, it can map the address to struct page.
+
+contig_page_data
+
+Makedumpfile can get the pglist_data structure according to this symbol
+'contig_page_data'. The pglist_data structure is used to describe the
+memory layout.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+==
+Export the address of 'mem_section' array, and it's length, structure size,
+and the 'section_mem_map' offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them will help to translate
+the address.
+
+page
+
+The size of a 'page' structure.
+
+pglist_data
+===
+The size of a 'pglist_data' structure.
+
+zone
+
+The size of a 'zone' structure.
+
+free_area
+=
+The size of a 'free_area' structure.
+
+list_head
+=
+The size of a 'list_head' structure.
+
+nodemask_t
+==
+The size of a 'nodemask_t' type.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
+   compound_order|compound_head)
+===
+The page structure is a familiar concept for most of linuxer, there is no
+need to explain too much. To know more information, please refer to the
+definition of the page struct(include/linux/mm_types.h).
+
+(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_
+  spanned_pages|node_id)
+===
+On NUMA machines, each NUMA node would have a pg_data_t to describe
+it's memory layout. On UMA machines there is a single pglist_data which
+describes the whole memory.
+
+The pglist_data structure contains these varibales, here export their
+offset in the pglist_data structure, which is defined in this file
+"include/linux/mmzone.h".
+
+(zone, free_area|vm_stat|spanned_pages)
+===
+The offset of these variables in the structure zone.
+
+Each node is divided up into a number of blocks called zones which
+represent ranges within memory. A zone is described by a structure zone.
+Each zone type is suitable for a different type of usage.
+
+

[PATCH 2/2 v2] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-01 Thread Lianbo Jiang

For AMD machine with SME feature, makedumpfile tools need to know
whether the crash kernel was encrypted or not. If SME is enabled
in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
contains the memory encryption mask, so need to remove the sme mask
to obtain the true physical address.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/machine_kexec_64.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 4c8acdfdc5a7..1860fe24117d 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -352,10 +352,24 @@ void machine_kexec(struct kimage *image)
 
 void arch_crash_save_vmcoreinfo(void)
 {
+   u64 sme_mask = sme_me_mask;
+
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
pgtable_l5_enabled());
+   /*
+* Currently, the local variable 'sme_mask' stores the value of
+* sme_me_mask(bit 47), and also write the value of sme_mask to
+* the vmcoreinfo.
+* If need, the bit(sme_mask) might be redefined in the future,
+* but the 'bit63' will be reserved.
+* For example:
+* [ misc  ][ enc bit  ][ other misc SME info   ]
+* ____1000______..._
+* 63   59   55   51   47   43   39   35   31   27   ... 3
+*/
+   VMCOREINFO_NUMBER(sme_mask);
 
 #ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 v2] kdump, vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-01 Thread Lianbo Jiang

This patchset did two things:
a. add a new document for vmcoreinfo

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it also normalizes the
exported variable as a standard ABI between kernel and use-space.

b. export the value of sme mask to vmcoreinfo

For AMD machine with SME feature, makedumpfile tools need to know whether
the crash kernel was encrypted or not. If SME is enabled in the first
kernel, the crash kernel's page table(pgd/pud/pmd/pte) contains the
memory encryption mask, so need to remove the sme mask to obtain the true
physical address.

Changes since v1:
1. No need to export a kernel-internal mask to userspace, so copy the
value of sme_me_mask to a local variable 'sme_mask' and write the value
of sme_mask to vmcoreinfo.
2. Add comment for the code.
3. Improve the patch log.
4. Add the vmcoreinfo documentation.

Lianbo Jiang (2):
  kdump: add the vmcoreinfo documentation
  kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

 Documentation/kdump/vmcoreinfo.txt | 400 +
 arch/x86/kernel/machine_kexec_64.c |  14 +
 2 files changed, 414 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 2/2 v8] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2018-11-29 Thread Lianbo Jiang

At present, when use the kexec_file_load syscall to load the kernel image
and initramfs(for example: kexec -s -p xxx), kernel does not pass the e820
reserved ranges to the second kernel, which might cause two problems:

The first one is the MMCONFIG issue. The basic problem is that this device
is in PCI segment 1 and the kernel PCI probing can not find it without all
the e820 I/O reservations being present in the e820 table. And the kdump
kernel does not have those reservations because the kexec command does not
pass the I/O reservation via the "memmap=xxx" command line option. (This
problem does not show up for other vendors, as SGI is apparently the
actually fails for everyone, but devices in segment 0 are then found by
some legacy lookup method.) The workaround for this is to pass the I/O
reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index f631a3f15587..5354a84f1684 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -380,6 +380,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v8] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-29 Thread Lianbo Jiang

When doing kexec_file_load, the first kernel needs to pass the e820
reserved ranges to the second kernel. But kernel can not exactly
match the e820 reserved ranges when walking through the iomem resources
with the descriptor 'IORES_DESC_NONE', because several e820 types(
e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
_TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
may pass these four types to the kdump kernel, that is not desired result.

So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
these code originally related to the descriptor 'IORES_DESC_NONE' need to
be updated. Otherwise, it will be easily confused and also cause some
errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
changed.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/ia64/kernel/efi.c |  4 
 arch/x86/kernel/e820.c |  2 +-
 arch/x86/mm/ioremap.c  | 13 -
 include/linux/ioport.h |  1 +
 kernel/resource.c  |  6 +++---
 5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 8f106638913c..1841e9b4db30 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
*code_resource,
break;
 
case EFI_RESERVED_TYPE:
+   name = "reserved";
+   desc = IORES_DESC_RESERVED;
+   break;
+
case EFI_RUNTIME_SERVICES_CODE:
case EFI_RUNTIME_SERVICES_DATA:
case EFI_ACPI_RECLAIM_MEMORY:
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 50895c2f937d..57fafdafb860 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1048,10 +1048,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 5378d10f1d31..fea2ef99415d 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -83,7 +83,18 @@ static bool __ioremap_check_ram(struct resource *res)
 
 static int __ioremap_check_desc_other(struct resource *res)
 {
-   return (res->desc != IORES_DESC_NONE);
+   /*
+* But now, the 'E820_TYPE_RESERVED' type is converted to the new
+* descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE',
+* it has been changed. And the value of 'mem_flags.desc_other'
+* is equal to 'true' if we don't strengthen the condition in this
+* function, that is wrong. Because originally it is equal to
+* 'false' for the same reserved type.
+*
+* So, that would be nice to keep it the same as before.
+*/
+   return ((res->desc != IORES_DESC_NONE) &&
+   (res->desc != IORES_DESC_RESERVED));
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */
diff --git a/kernel/resource.c b/kernel/resource.c
index b0fbf685c77a..f34a632c4169 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -994,7 +994,7 @@ __reserve_region_with_split(struct resource *root, 
resource_size_t start,
res->start = start;
res->end = end;
res->flags = type | IORESOURCE_BUSY;
-   res->desc = IORES_DESC_NONE;
+   res->desc = IORES_DESC_RESERVED;
 
while (1) {
 
@@ -1029,7 +1029,7 @@ __reserve_region_with_split(struct resource *root, 
resource_size_t start,

[PATCH 0/2 v8] add reserved e820 ranges to the kdump kernel e820 table

2018-11-29 Thread Lianbo Jiang

This patchset did two things:
a). add a new I/O resource descriptor 'IORES_DESC_RESERVED'

When doing kexec_file_load, the first kernel needs to pass the e820
reserved ranges to the second kernel. But kernel can not exactly
match the e820 reserved ranges when walking through the iomem resources
with the descriptor 'IORES_DESC_NONE', because several e820 types(
e.g. E820_TYPE_RESERVED_KERN/E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820
_TYPE_RESERVED) are converted to the descriptor 'IORES_DESC_NONE'. It
may pass these four types to the kdump kernel, that is not desired result.

So, this patch adds a new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

In addition, since the new descriptor 'IORES_DESC_RESERVED' is introduced,
these code originally related to the descriptor 'IORES_DESC_NONE' need to
be updated. Otherwise, it will be easily confused and also cause some
errors. Because the 'E820_TYPE_RESERVED' type is converted to the new
descriptor 'IORES_DESC_RESERVED' instead of 'IORES_DESC_NONE', it has been
changed.

b). add the e820 reserved ranges to kdump kernel e820 table

At present, when use the kexec_file_load syscall to load the kernel image
and initramfs(for example: kexec -s -p xxx), kernel does not pass the e820
reserved ranges to the second kernel, which might cause two problems:

The first one is the MMCONFIG issue. The basic problem is that this device
is in PCI segment 1 and the kernel PCI probing can not find it without all
the e820 I/O reservations being present in the e820 table. And the kdump
kernel does not have those reservations because the kexec command does not
pass the I/O reservation via the "memmap=xxx" command line option. (This
problem does not show up for other vendors, as SGI is apparently the
actually fails for everyone, but devices in segment 0 are then found by
some legacy lookup method.) The workaround for this is to pass the I/O
reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()


Lianbo Jiang (2):
  resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/ia64/kernel/efi.c  |  4 
 arch/x86/kernel/crash.c |  6 ++
 arch/x86/kernel/e820.c  |  2 +-
 arch/x86/mm/ioremap.c   | 13 -
 include/linux/ioport.h  |  1 +
 kernel/resource.c   |  6 +++---
 6 files changed, 27 insertions(+), 5 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 0/2 RESEND v7] add reserved e820 ranges to the kdump kernel e820 table

2018-11-23 Thread Lianbo Jiang

These patches add the new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces, and in order to make it still
work after the new descriptor is added, these codes originally related
to 'IORES_DESC_NONE' have been updated.

In addition, for the MMCONFIG issue and the SME kdump issue, it is
necessary to pass the e820 reserved ranges to kdump kernel.

Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.

Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.

Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
   resources. Please refer to this commit <010a93bf97c7> "resource: Fix
   find_next_iomem_res() iteration issue"

Changes since v4:
1. Improve the patch log, and add kernel log.

Changes since v5:
1. Rewrite these patches log.

Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces,
   and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
   new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.

Lianbo Jiang (2):
  resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'
  x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

 arch/ia64/kernel/efi.c  | 4 
 arch/x86/kernel/crash.c | 6 ++
 arch/x86/kernel/e820.c  | 2 +-
 arch/x86/mm/ioremap.c   | 9 -
 include/linux/ioport.h  | 1 +
 kernel/resource.c   | 6 +++---
 6 files changed, 23 insertions(+), 5 deletions(-)

-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 RESEND v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-23 Thread Lianbo Jiang

The upstream kernel can not accurately add the e820 reserved type to
kdump krenel e820 table.

Kdump uses walk_iomem_res_desc() to iterate io resources, then adds
the matched resource ranges to the e820 table for kdump kernel. But,
when convert the e820 type to the iores descriptor, several e820
types are converted to 'IORES_DESC_NONE' in this function e820_type
_to_iores_desc(). So the walk_iomem_res_desc() will get the redundant
types(such as E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_KERN) when
walk through io resources with the descriptor 'IORES_DESC_NONE'.

This patch adds the new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

Furthermore, in order to make it still work after the new descriptor
is added, these codes originally related to 'IORES_DESC_NONE' have
been updated.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/ia64/kernel/efi.c | 4 
 arch/x86/kernel/e820.c | 2 +-
 arch/x86/mm/ioremap.c  | 9 -
 include/linux/ioport.h | 1 +
 kernel/resource.c  | 6 +++---
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 8f106638913c..1841e9b4db30 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -1231,6 +1231,10 @@ efi_initialize_iomem_resources(struct resource 
*code_resource,
break;
 
case EFI_RESERVED_TYPE:
+   name = "reserved";
+   desc = IORES_DESC_RESERVED;
+   break;
+
case EFI_RUNTIME_SERVICES_CODE:
case EFI_RUNTIME_SERVICES_DATA:
case EFI_ACPI_RECLAIM_MEMORY:
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 50895c2f937d..57fafdafb860 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1048,10 +1048,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 5378d10f1d31..91b6112e7489 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -83,7 +83,14 @@ static bool __ioremap_check_ram(struct resource *res)
 
 static int __ioremap_check_desc_other(struct resource *res)
 {
-   return (res->desc != IORES_DESC_NONE);
+   /*
+* The E820_TYPE_RESERVED was converted to the IORES_DESC_NONE
+* before the new IORES_DESC_RESERVED is added, so it contained
+* the e820 reserved type. In order to make it still work for
+* SEV, here keep it the same as before.
+*/
+   return ((res->desc != IORES_DESC_NONE) ||
+   (res->desc != IORES_DESC_RESERVED));
 }
 
 static int __ioremap_res_check(struct resource *res, void *arg)
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */
diff --git a/kernel/resource.c b/kernel/resource.c
index b0fbf685c77a..f34a632c4169 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -994,7 +994,7 @@ __reserve_region_with_split(struct resource *root, 
resource_size_t start,
res->start = start;
res->end = end;
res->flags = type | IORESOURCE_BUSY;
-   res->desc = IORES_DESC_NONE;
+   res->desc = IORES_DESC_RESERVED;
 
while (1) {
 
@@ -1029,7 +1029,7 @@ __reserve_region_with_split(struct resource *root, 
resource_size_t start,
next_res->start = conflict->end + 1;
next_res->end = end;
next_res->flags = type | IORESOURCE_BUSY;
-   next_res->desc = IORES_DESC_NONE;
+   next_res->desc = IORES_DESC_RESERVED;
}
} else {
res->start = con

[PATCH 2/2 RESEND v7] x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table

2018-11-23 Thread Lianbo Jiang

At present, when use the kexec_file_load syscall to load the kernel image
and initramfs(for example: kexec -s -p xxx), the upstream kernel does not
pass the e820 reserved ranges to the second kernel, which might cause two
problems:

The first one is the MMCONFIG issue. The basic problem is that this device
is in PCI segment 1 and the kernel PCI probing can not find it without all
the e820 I/O reservations being present in the e820 table. And the kdump
kernel does not have those reservations because the kexec command does not
pass the I/O reservation via the "memmap=xxx" command line option. (This
problem does not show up for other vendors, as SGI is apparently the
actually fails for everyone, but devices in segment 0 are then found by
some legacy lookup method.) The workaround for this is to pass the I/O
reserved regions to the kdump kernel.

MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0x), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]

The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.

The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to kdump kernel.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
 arch/x86/kernel/crash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index f631a3f15587..5354a84f1684 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -380,6 +380,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, ,
memmap_entry_callback);
 
+   /* Add e820 reserved ranges */
+   cmd.type = E820_TYPE_RESERVED;
+   flags = IORESOURCE_MEM;
+   walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, ,
+  memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

[PATCH 1/2 v7] resource: add the new I/O resource descriptor 'IORES_DESC_RESERVED'

2018-11-15 Thread Lianbo Jiang

The upstream kernel can not accurately add the e820 reserved type to
kdump krenel e820 table.

Kdump uses walk_iomem_res_desc() to iterate io resources, then adds
the matched resource ranges to the e820 table for kdump kernel. But,
when convert the e820 type into the iores descriptor, several e820
types are converted to 'IORES_DESC_NONE' in this function e820_type
_to_iores_desc(). So the walk_iomem_res_desc() will get unnecessary
types(such as E820_TYPE_RAM/E820_TYPE_UNUSABLE/E820_TYPE_KERN) when
walk through io resources by the descriptor 'IORES_DESC_NONE'.

This patch adds the new I/O resource descriptor 'IORES_DESC_RESERVED'
for the iomem resources search interfaces. It is helpful to exactly
match the reserved resource ranges when walking through iomem resources.

Suggested-by: Dave Young 
Signed-off-by: Lianbo Jiang 
---
Changes since v5:
1. Improve the patch log

Changes since v6:
1. Modify this patch, and add the new I/O resource descriptor
   'IORES_DESC_RESERVED' for the iomem resources search interfaces.
2. Improve patch log.

 arch/x86/kernel/e820.c | 2 +-
 include/linux/ioport.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 50895c2f937d..57fafdafb860 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1048,10 +1048,10 @@ static unsigned long __init 
e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM:return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM:return 
IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+   case E820_TYPE_RESERVED:return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE:/* Fall-through: */
-   case E820_TYPE_RESERVED:/* Fall-through: */
default:return IORES_DESC_NONE;
}
 }
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+   IORES_DESC_RESERVED = 8,
 };
 
 /* helpers to define resources */
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

1 2 >

1 - 100 of 171 matches

Mail list logo