[PATCH v15 03/16] s390, kexec_file: drop arch_kexec_mem_walk()
Since s390 already knows where to locate buffers, calling arch_kexec_mem_walk() has no sense. So we can just drop it as kbuf->mem indicates this while all other architectures sets it to 0 initially. This change is a preparatory work for the next patch, where all the variant memory walks, either on system resource or memblock, will be put in one common place so that it will satisfy all the architectures' need. Signed-off-by: AKASHI Takahiro Reviewed-by: Philipp Rudo Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Dave Young Cc: Vivek Goyal Cc: Baoquan He --- arch/s390/kernel/machine_kexec_file.c | 10 -- include/linux/kexec.h | 8 kernel/kexec_file.c | 4 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/s390/kernel/machine_kexec_file.c b/arch/s390/kernel/machine_kexec_file.c index f413f57f8d20..32023b4f9dc0 100644 --- a/arch/s390/kernel/machine_kexec_file.c +++ b/arch/s390/kernel/machine_kexec_file.c @@ -134,16 +134,6 @@ int kexec_file_add_initrd(struct kimage *image, struct s390_load_data *data, return ret; } -/* - * The kernel is loaded to a fixed location. Turn off kexec_locate_mem_hole - * and provide kbuf->mem by hand. - */ -int arch_kexec_walk_mem(struct kexec_buf *kbuf, - int (*func)(struct resource *, void *)) -{ - return 1; -} - int arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section, const Elf_Shdr *relsec, diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 49ab758f4d91..f378cb786f1b 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -145,6 +145,14 @@ int kexec_image_probe_default(struct kimage *image, void *buf, unsigned long buf_len); int kexec_image_post_load_cleanup_default(struct kimage *image); +/* + * If kexec_buf.mem is set to this value, kexec_locate_mem_hole() + * will try to allocate free memory. Arch may overwrite it. + */ +#ifndef KEXEC_BUF_MEM_UNKNOWN +#define KEXEC_BUF_MEM_UNKNOWN 0 +#endif + /** * struct kexec_buf - parameters for finding a place for a buffer in memory * @image: kexec image in which memory to search. diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 63c7ce1c0c3e..0fcaa86219d1 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -534,6 +534,10 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf) { int ret; + /* Arch knows where to place */ + if (kbuf->mem != KEXEC_BUF_MEM_UNKNOWN) + return 0; + ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback); return ret == 1 ? 0 : -EADDRNOTAVAIL; -- 2.19.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v15 02/16] kexec_file: make kexec_image_post_load_cleanup_default() global
Change this function from static to global so that arm64 can implement its own arch_kimage_file_post_load_cleanup() later using kexec_image_post_load_cleanup_default(). Signed-off-by: AKASHI Takahiro Acked-by: Dave Young Cc: Vivek Goyal Cc: Baoquan He --- include/linux/kexec.h | 1 + kernel/kexec_file.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 9e4e638fb505..49ab758f4d91 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -143,6 +143,7 @@ extern const struct kexec_file_ops * const kexec_file_loaders[]; int kexec_image_probe_default(struct kimage *image, void *buf, unsigned long buf_len); +int kexec_image_post_load_cleanup_default(struct kimage *image); /** * struct kexec_buf - parameters for finding a place for a buffer in memory diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index c6a3b6851372..63c7ce1c0c3e 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -78,7 +78,7 @@ void * __weak arch_kexec_kernel_image_load(struct kimage *image) return kexec_image_load_default(image); } -static int kexec_image_post_load_cleanup_default(struct kimage *image) +int kexec_image_post_load_cleanup_default(struct kimage *image) { if (!image->fops || !image->fops->cleanup) return 0; -- 2.19.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v15 04/16] powerpc, kexec_file: factor out memblock-based arch_kexec_walk_mem()
Memblock list is another source for usable system memory layout. So move powerpc's arch_kexec_walk_mem() to common code so that other memblock-based architectures, particularly arm64, can also utilise it. A moved function is now renamed to kexec_walk_memblock() and integrated into kexec_locate_mem_hole(), which will now be usable for all architectures with no need for overriding arch_kexec_walk_mem(). With this change, arch_kexec_walk_mem() need no longer be a weak function, and was now renamed to kexec_walk_resources(). Since powerpc doesn't support kdump in its kexec_file_load(), the current kexec_walk_memblock() won't work for kdump either in this form, this will be fixed in the next patch. Signed-off-by: AKASHI Takahiro Cc: "Eric W. Biederman" Acked-by: Dave Young Cc: Vivek Goyal Cc: Baoquan He Acked-by: James Morse --- arch/powerpc/kernel/machine_kexec_file_64.c | 54 --- include/linux/kexec.h | 2 - kernel/kexec_file.c | 60 +++-- 3 files changed, 57 insertions(+), 59 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec_file_64.c b/arch/powerpc/kernel/machine_kexec_file_64.c index c77e95e9b384..0d20c7ad40fa 100644 --- a/arch/powerpc/kernel/machine_kexec_file_64.c +++ b/arch/powerpc/kernel/machine_kexec_file_64.c @@ -24,7 +24,6 @@ #include #include -#include #include #include #include @@ -46,59 +45,6 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, return kexec_image_probe_default(image, buf, buf_len); } -/** - * arch_kexec_walk_mem - call func(data) for each unreserved memory block - * @kbuf: Context info for the search. Also passed to @func. - * @func: Function to call for each memory block. - * - * This function is used by kexec_add_buffer and kexec_locate_mem_hole - * to find unreserved memory to load kexec segments into. - * - * Return: The memory walk will stop when func returns a non-zero value - * and that value will be returned. If all free regions are visited without - * func returning non-zero, then zero will be returned. - */ -int arch_kexec_walk_mem(struct kexec_buf *kbuf, - int (*func)(struct resource *, void *)) -{ - int ret = 0; - u64 i; - phys_addr_t mstart, mend; - struct resource res = { }; - - if (kbuf->top_down) { - for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0, - &mstart, &mend, NULL) { - /* -* In memblock, end points to the first byte after the -* range while in kexec, end points to the last byte -* in the range. -*/ - res.start = mstart; - res.end = mend - 1; - ret = func(&res, kbuf); - if (ret) - break; - } - } else { - for_each_free_mem_range(i, NUMA_NO_NODE, 0, &mstart, &mend, - NULL) { - /* -* In memblock, end points to the first byte after the -* range while in kexec, end points to the last byte -* in the range. -*/ - res.start = mstart; - res.end = mend - 1; - ret = func(&res, kbuf); - if (ret) - break; - } - } - - return ret; -} - /** * setup_purgatory - initialize the purgatory's global variables * @image: kexec image. diff --git a/include/linux/kexec.h b/include/linux/kexec.h index f378cb786f1b..d58d1f2fab10 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -192,8 +192,6 @@ int __weak arch_kexec_apply_relocations(struct purgatory_info *pi, const Elf_Shdr *relsec, const Elf_Shdr *symtab); -int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf, - int (*func)(struct resource *, void *)); extern int kexec_add_buffer(struct kexec_buf *kbuf); int kexec_locate_mem_hole(struct kexec_buf *kbuf); diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 0fcaa86219d1..370d7eab49fe 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -501,6 +502,55 @@ static int locate_mem_hole_callback(struct resource *res, void *arg) return locate_mem_hole_bottom_up(start, end, kbuf); } +#if defined(CONFIG_HAVE_MEMBLOCK) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK) +static int kexec_walk_memblock(struct kexec_buf *kbuf, + int (*func)(struct resource *, void *)) +{ + int ret =
[PATCH v15 00/16] arm64: kexec: add kexec_file_load() support
This is the fifteenth round of implementing kexec_file_load() support on arm64.[1] (See "Changes" below) Most of the code is based on kexec-tools. # Since v15, we need a few prerequisite patches; See "Changes." # You will find them in [1], too. This patch series enables us to * load the kernel by specifying its file descriptor, instead of user- filled buffer, at kexec_file_load() system call, and * optionally verify its signature at load time for trusted boot. Kernel virtual address randomization is also supported since v9. Contrary to kexec_load() system call, as we discussed a long time ago, users may not be allowed to provide a device tree to the 2nd kernel explicitly, hence enforcing a dt blob of the first kernel to be re-used internally. To use kexec_file_load() system call, instead of kexec_load(), at kexec command, '-s' option must be specified. See [2] for a necessary patch for kexec-tools. To analyze a generated crash dump file, use the latest master branch of crash utility[3]. I always try to submit patches to fix any inconsistencies introduced in the latest kernel. Regarding a kernel image verification, a signature must be presented along with the binary itself. A signature is basically a hash value calculated against the whole binary data and encrypted by a key which will be authenticated by one of the system's trusted certificates. Any attempt to read and load a to-be-kexec-ed kernel image through a system call will be checked and blocked if the binary's hash value doesn't match its associated signature. There are two methods available now: 1. implementing arch-specific verification hook of kexec_file_load() 2. utilizing IMA(Integrity Measurement Architecture)[4] appraisal framework Before my v7, I believed that my patch only supports (1) but am now confident that (2) comes free if IMA is enabled and properly configured. (1) Arch-specific verification hook If CONFIG_KEXEC_VERIFY_SIG is enabled, kexec_file_load() invokes an arch- defined (and hence file-format-specific) hook function to check for the validity of kernel binary. On x86, a signature is embedded into a PE file (Microsoft's format) header of binary. Since arm64's "Image" can also be seen as a PE file as far as CONFIG_EFI is enabled, we adopt this format for kernel signing. As in the case of UEFI applications, we can create a signed kernel image: $ sbsign --key ${KEY} --cert ${CERT} Image You may want to use certs/signing_key.pem, which is intended to be used for module signing (CONFIG_MODULE_SIG), as ${KEY} and ${CERT} for test purpose. (2) IMA appraisal-based IMA was first introduced in linux in order to meet TCG (Trusted Computing Group) requirement that all the sensitive files be *measured* before reading/executing them to detect any untrusted changes/modification. Then appraisal feature, which allows us to ensure the integrity of files and even prevent them from reading/executing, was added later. Meanwhile, kexec_file_load() has been merged since v3.17 and evolved to enable IMA-appraisal type verification by the commit b804defe4297 ("kexec: replace call to copy_file_from_fd() with kernel version"). In this scheme, a signature will be stored in a extended file attribute, "security.ima" while a decryption key is hold in a dedicated keyring, ".ima" or "_ima". All the necessary process of verification is confined in a secure API, kernel_read_file_from_fd(), called by kexec_file_load(). Please note that powerpc is one of the two architectures now supporting KEXEC_FILE, and that it wishes to extend IMA, where a signature may be appended to "vmlinux" file[5], like module signing, instead of using an extended file attribute. While IMA meant to be used with TPM (Trusted Platform Module) on secure platform, IMA is still usable without TPM. Here is an example procedure about how we can give it a try to run the feature using a self-signed root ca for demo/test purposes: 1) Generate needed keys and certificates, following "Generate trusted keys" section in README of ima-evm-utils[6]. 2) Build the kernel with the following kernel configurations, specifying "ima-local-ca.pem" for CONFIG_SYSTEM_TRUSTED_KEYS: CONFIG_EXT4_FS_SECURITY CONFIG_INTEGRITY_SIGNATURE CONFIG_INTEGRITY_ASYMMETRIC_KEYS CONFIG_INTEGRITY_TRUSTED_KEYRING CONFIG_IMA CONFIG_IMA_WRITE_POLICY CONFIG_IMA_READ_POLICY CONFIG_IMA_APPRAISE CONFIG_IMA_APPRAISE_BOOTPARAM CONFIG_SYSTEM_TRUSTED_KEYS Please note that CONFIG_KEXEC_VERIFY_SIG is not, actually should not be, enabled. 3) Sign(label) a kernel image binary to be kexec-ed on target filesystem: $ evmctl ima_sign --key /path/to/private_key.pem /your/Image 4) Add a command line parameter and boot the kernel: ima_appraise=enforce On live system, 5) Set a security policy: $ mount -t securityfs none /sys/kernel/security $ echo "appraise func=KEXEC_KERNEL_CHECK ap
[PATCH v15 01/16] asm-generic: add kexec_file_load system call to unistd.h
The initial user of this system call number is arm64. Signed-off-by: AKASHI Takahiro Acked-by: Arnd Bergmann --- include/uapi/asm-generic/unistd.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index df4bedb9b01c..acea91e49523 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -736,9 +736,11 @@ __SYSCALL(__NR_statx, sys_statx) __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents) #define __NR_rseq 293 __SYSCALL(__NR_rseq, sys_rseq) +#define __NR_kexec_file_load 294 +__SYSCALL(__NR_kexec_file_load, sys_kexec_file_load) #undef __NR_syscalls -#define __NR_syscalls 294 +#define __NR_syscalls 295 /* * 32 bit systems traditionally used different -- 2.19.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue
在 2018年09月27日 22:03, Bjorn Helgaas 写道: > On Thu, Sep 27, 2018 at 01:27:41PM +0800, lijiang wrote: >> 在 2018年09月25日 06:15, Bjorn Helgaas 写道: >>> From: Bjorn Helgaas >>> >>> Previously find_next_iomem_res() used "*res" as both an input parameter for >>> the range to search and the type of resource to search for, and an output >>> parameter for the resource we found, which makes the interface confusing >>> and hard to use correctly. >>> >>> All callers allocate a single struct resource and use it for repeated calls >>> to find_next_iomem_res(). When find_next_iomem_res() returns a resource, >>> it overwrites the start, end, flags, and desc members of the struct. If we >>> call find_next_iomem_res() again, we must update or restore these fields. >>> >>> The callers (__walk_iomem_res_desc() and walk_system_ram_range()) do not >>> restore res->flags, so if the caller is searching for flags of >>> IORESOURCE_MEM | IORESOURCE_BUSY and finds a resource with flags of >>> IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search will >>> find only resources marked as IORESOURCE_SYSRAM. >>> >>> Fix this by restructuring the interface so it takes explicit "start, end, >>> flags" parameters and uses "*res" only as an output parameter. >> >> Hi, Bjorn >> I personally suggest that some comments might be added in the code, make it >> clear >> and easy to understand, then which could avoid the old confusion and more >> code changes. > > Since I think the current interface (using *res as both input and > output parameters that have very different meanings) is confusing, > it's hard for *me* to write comments that make it less confusing, but > of course, you're welcome to propose something. > > My opinion (probably not universally shared) is that my proposal would > make the code more readable, and it's worth doing even though the diff > is larger. > > Anyway, I'll post these patches independently and see if anybody else > has an opinion. > I don't mind at all, because that is just my own opinion about more code changes. Anyway, thank you to improve this patch. Regards, Lianbo > Bjorn > >>> Original-patch: >>> http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com >>> Based-on-patch-by: Lianbo Jiang >>> Signed-off-by: Bjorn Helgaas >>> --- >>> kernel/resource.c | 94 >>> +++-- >>> 1 file changed, 41 insertions(+), 53 deletions(-) >>> >>> diff --git a/kernel/resource.c b/kernel/resource.c >>> index 155ec873ea4d..9891ea90cc8f 100644 >>> --- a/kernel/resource.c >>> +++ b/kernel/resource.c >>> @@ -319,23 +319,26 @@ int release_resource(struct resource *old) >>> EXPORT_SYMBOL(release_resource); >>> >>> /* >>> - * Finds the lowest iomem resource existing within [res->start..res->end]. >>> - * The caller must specify res->start, res->end, res->flags, and optionally >>> - * desc. If found, returns 0, res is overwritten, if not found, returns >>> -1. >>> - * This function walks the whole tree and not just first level children >>> until >>> - * and unless first_level_children_only is true. >>> + * Finds the lowest iomem resource that covers part of [start..end]. The >>> + * caller must specify start, end, flags, and desc (which may be >>> + * IORES_DESC_NONE). >>> + * >>> + * If a resource is found, returns 0 and *res is overwritten with the part >>> + * of the resource that's within [start..end]; if none is found, returns >>> + * -1. >>> + * >>> + * This function walks the whole tree and not just first level children >>> + * unless first_level_children_only is true. >>> */ >>> -static int find_next_iomem_res(struct resource *res, unsigned long desc, >>> - bool first_level_children_only) >>> +static int find_next_iomem_res(resource_size_t start, resource_size_t end, >>> + unsigned long flags, unsigned long desc, >>> + bool first_level_children_only, >>> + struct resource *res) >>> { >>> - resource_size_t start, end; >>> struct resource *p; >>> bool sibling_only = false; >>> >>> BUG_ON(!res); >>> - >>> - start = res->start; >>> - end = res->end; >>> BUG_ON(start >= end); >>> >>> if (first_level_children_only) >>> @@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, >>> unsigned long desc, >>> read_lock(&resource_lock); >>> >>> for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) { >>> - if ((p->flags & res->flags) != res->flags) >>> + if ((p->flags & flags) != flags) >>> continue; >>> if ((desc != IORES_DESC_NONE) && (desc != p->desc)) >>> continue; >>> @@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, >>> unsigned long desc, >>> read_unlock(&resource_lock); >>> if (!p) >>> return -1; >>> + >>> /* copy data */ >>> - if (res->start <
Re: [PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled
在 2018年09月28日 00:53, Borislav Petkov 写道: > On Thu, Sep 27, 2018 at 03:19:52PM +0800, Lianbo Jiang wrote: >> When SME is enabled in the first kernel, we will allocate unencrypted pages >> for kdump in order to be able to boot the kdump kernel like kexec. > > This is not what the commit does - it marks the control pages as > decrypted when SME. Why doesn't the commit message state that and why is > this being done? > Ok, i will improve this patch log. If SME is active, we need to mark the control pages as decrypted, because when we boot to the kdump kernel, these pages won't be accessed encrypted at the initial stage, in order to boot the kdump kernel in the same manner as originally booted. >> Signed-off-by: Lianbo Jiang >> Reviewed-by: Tom Lendacky >> --- >> kernel/kexec_core.c | 12 >> 1 file changed, 12 insertions(+) >> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index 23a83a4da38a..e7efcd1a977b 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c >> @@ -471,6 +471,16 @@ static struct page >> *kimage_alloc_crash_control_pages(struct kimage *image, >> } >> } >> >> +if (pages) { >> +/* >> + * For kdump, we need to ensure that these pages are >> + * unencrypted pages if SME is enabled. > > Remember to always call unencrypted pages "decrypted" - this is the > convention we agreed upon and it should keep the confusion level at > minimum to others staring at this code. > Ok, i will improve this comment. >> + * By the way, it is unnecessary to call the arch_ >> + * kexec_pre_free_pages(), which will make the code >> + * become more simple. >> + */ > > This second sentence I don't understand... > There are two functions that are usually called in pairs, they are: arch_kexec_post_alloc_pages() and arch_kexec_pre_free_pages(). One marks the pages as decrypted, another one marks the pages as encrypted. But for the crash control pages, no need to call arch_kexec_pre_free_pages(), there are three reasons: 1. Crash pages are reserved in memblock, these pages are only used by kdump, no other people uses these pages; 2. Whenever crash pages are allocated, these pages are always marked as decrypted(when SME is active); 3. If we plan to call the arch_kexe_pre_free_pages(), we have to store these pages to somewhere, which will have more code changes. So, here it is redundant to call the arch_kexe_pre_free_pages(). Thanks. Lianbo >> +arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0); >> +} >> return pages; >> } >> >> @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage >> *image, >> result = -ENOMEM; >> goto out; >> } >> +arch_kexec_post_alloc_pages(page_address(page), 1, 0); >> ptr = kmap(page); >> ptr += maddr & ~PAGE_MASK; >> mchunk = min_t(size_t, mbytes, >> @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage >> *image, >> result = copy_from_user(ptr, buf, uchunk); >> kexec_flush_icache_page(page); >> kunmap(page); >> +arch_kexec_pre_free_pages(page_address(page), 1); >> if (result) { >> result = -EFAULT; >> goto out; >> -- >> 2.17.1 >> > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory
在 2018年09月28日 00:10, Borislav Petkov 写道: > On Thu, Sep 27, 2018 at 10:53:47PM +0800, lijiang wrote: >> If no need to break this line, it will cause a warning of exceeding 80 >> characters per line. > > That's fine - we don't take the 80 cols rule blindly but apply common > sense. In this particular case the lines can stick out because they're > simply externs and are meant to be skimmed over and not really read. :) > Ok, i see. Thanks. >> Thank you for pointing out this issue, i forgot to remove this header file. >> >> I will get rid of this header file and post this patch again. > > No need - already fixed that up. > Great, thank you so much for your help. Regards, Lianbo > Thx. > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled
On Thu, Sep 27, 2018 at 03:19:52PM +0800, Lianbo Jiang wrote: > When SME is enabled in the first kernel, we will allocate unencrypted pages > for kdump in order to be able to boot the kdump kernel like kexec. This is not what the commit does - it marks the control pages as decrypted when SME. Why doesn't the commit message state that and why is this being done? > Signed-off-by: Lianbo Jiang > Reviewed-by: Tom Lendacky > --- > kernel/kexec_core.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index 23a83a4da38a..e7efcd1a977b 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -471,6 +471,16 @@ static struct page > *kimage_alloc_crash_control_pages(struct kimage *image, > } > } > > + if (pages) { > + /* > + * For kdump, we need to ensure that these pages are > + * unencrypted pages if SME is enabled. Remember to always call unencrypted pages "decrypted" - this is the convention we agreed upon and it should keep the confusion level at minimum to others staring at this code. > + * By the way, it is unnecessary to call the arch_ > + * kexec_pre_free_pages(), which will make the code > + * become more simple. > + */ This second sentence I don't understand... > + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0); > + } > return pages; > } > > @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image, > result = -ENOMEM; > goto out; > } > + arch_kexec_post_alloc_pages(page_address(page), 1, 0); > ptr = kmap(page); > ptr += maddr & ~PAGE_MASK; > mchunk = min_t(size_t, mbytes, > @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image, > result = copy_from_user(ptr, buf, uchunk); > kexec_flush_icache_page(page); > kunmap(page); > + arch_kexec_pre_free_pages(page_address(page), 1); > if (result) { > result = -EFAULT; > goto out; > -- > 2.17.1 > -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory
On Thu, Sep 27, 2018 at 10:53:47PM +0800, lijiang wrote: > If no need to break this line, it will cause a warning of exceeding 80 > characters per line. That's fine - we don't take the 80 cols rule blindly but apply common sense. In this particular case the lines can stick out because they're simply externs and are meant to be skimmed over and not really read. :) > Thank you for pointing out this issue, i forgot to remove this header file. > > I will get rid of this header file and post this patch again. No need - already fixed that up. Thx. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory
在 2018年09月27日 21:17, Borislav Petkov 写道: > On Thu, Sep 27, 2018 at 03:19:51PM +0800, Lianbo Jiang wrote: >> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h >> index 6de64840dd22..f8795f9581c7 100644 >> --- a/arch/x86/include/asm/io.h >> +++ b/arch/x86/include/asm/io.h >> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t >> offset, unsigned long size); >> #define ioremap_cache ioremap_cache >> extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long >> size, unsigned long prot_val); >> #define ioremap_prot ioremap_prot >> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, >> +unsigned long size); > > No need to break this line - see how the other externs don't. > Ok, i will fix it. If no need to break this line, it will cause a warning of exceeding 80 characters per line. >> +#define ioremap_encrypted ioremap_encrypted >> >> /** >> * ioremap - map bus memory into CPU space >> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c >> index c63a545ec199..e01e6c695add 100644 >> --- a/arch/x86/mm/ioremap.c >> +++ b/arch/x86/mm/ioremap.c >> @@ -24,6 +24,7 @@ >> #include >> #include >> #include >> +#include > > What is that include for and why is it not up there with the includes instead here with the ones? > Thank you for pointing out this issue, i forgot to remove this header file. I will get rid of this header file and post this patch again. Regards, Lianbo ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue
From: Bjorn Helgaas Previously find_next_iomem_res() used "*res" as both an input parameter for the range to search and the type of resource to search for, and an output parameter for the resource we found, which makes the interface confusing. The current callers use find_next_iomem_res() incorrectly because they allocate a single struct resource and use it for repeated calls to find_next_iomem_res(). When find_next_iomem_res() returns a resource, it overwrites the start, end, flags, and desc members of the struct. If we call find_next_iomem_res() again, we must update or restore these fields. The previous code restored res.start and res.end, but not res.flags or res.desc. Since the callers did not restore res.flags, if they searched for flags IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would incorrectly skip resources unless they were also marked as IORESOURCE_SYSRAM. Fix this by restructuring the interface so it takes explicit "start, end, flags" parameters and uses "*res" only as an output parameter. Original-patch: http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com Based-on-patch-by: Lianbo Jiang Signed-off-by: Bjorn Helgaas --- kernel/resource.c | 94 +++-- 1 file changed, 41 insertions(+), 53 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index 155ec873ea4d..9891ea90cc8f 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -319,23 +319,26 @@ int release_resource(struct resource *old) EXPORT_SYMBOL(release_resource); /* - * Finds the lowest iomem resource existing within [res->start..res->end]. - * The caller must specify res->start, res->end, res->flags, and optionally - * desc. If found, returns 0, res is overwritten, if not found, returns -1. - * This function walks the whole tree and not just first level children until - * and unless first_level_children_only is true. + * Finds the lowest iomem resource that covers part of [start..end]. The + * caller must specify start, end, flags, and desc (which may be + * IORES_DESC_NONE). + * + * If a resource is found, returns 0 and *res is overwritten with the part + * of the resource that's within [start..end]; if none is found, returns + * -1. + * + * This function walks the whole tree and not just first level children + * unless first_level_children_only is true. */ -static int find_next_iomem_res(struct resource *res, unsigned long desc, - bool first_level_children_only) +static int find_next_iomem_res(resource_size_t start, resource_size_t end, + unsigned long flags, unsigned long desc, + bool first_level_children_only, + struct resource *res) { - resource_size_t start, end; struct resource *p; bool sibling_only = false; BUG_ON(!res); - - start = res->start; - end = res->end; BUG_ON(start >= end); if (first_level_children_only) @@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc, read_lock(&resource_lock); for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) { - if ((p->flags & res->flags) != res->flags) + if ((p->flags & flags) != flags) continue; if ((desc != IORES_DESC_NONE) && (desc != p->desc)) continue; @@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc, read_unlock(&resource_lock); if (!p) return -1; + /* copy data */ - if (res->start < p->start) - res->start = p->start; - if (res->end > p->end) - res->end = p->end; + res->start = max(start, p->start); + res->end = min(end, p->end); res->flags = p->flags; res->desc = p->desc; return 0; } -static int __walk_iomem_res_desc(struct resource *res, unsigned long desc, -bool first_level_children_only, -void *arg, +static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end, +unsigned long flags, unsigned long desc, +bool first_level_children_only, void *arg, int (*func)(struct resource *, void *)) { - u64 orig_end = res->end; + struct resource res; int ret = -1; - while ((res->start < res->end) && - !find_next_iomem_res(res, desc, first_level_children_only)) { - ret = (*func)(res, arg); + while (start < end && + !find_next_iomem_res(start, end, flags, desc, + first_level_children_only, &res)) { + ret = (*func
[PATCH 2/3] resource: Include resource end in walk_*() interfaces
From: Bjorn Helgaas find_next_iomem_res() finds an iomem resource that covers part of a range described by "start, end". All callers expect that range to be inclusive, i.e., both start and end are included, but find_next_iomem_res() doesn't handle the end address correctly. If it finds an iomem resource that contains exactly the end address, it skips it, e.g., if "start, end" is [0x0-0x1] and there happens to be an iomem resource [mem 0x1-0x1] (the single byte at 0x1), we skip it: find_next_iomem_res(...) { start = 0x0; end = 0x1; for (p = next_resource(...)) { # p->start = 0x1; # p->end = 0x1; # we *should* return this resource, but this condition is false: if ((p->end >= start) && (p->start < end)) break; Adjust find_next_iomem_res() so it allows a resource that includes the single byte at the end of the range. This is a corner case that we probably don't see in practice. Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch range fix") Signed-off-by: Bjorn Helgaas --- kernel/resource.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index 30e1bc68503b..155ec873ea4d 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -319,7 +319,7 @@ int release_resource(struct resource *old) EXPORT_SYMBOL(release_resource); /* - * Finds the lowest iomem resource existing within [res->start.res->end). + * Finds the lowest iomem resource existing within [res->start..res->end]. * The caller must specify res->start, res->end, res->flags, and optionally * desc. If found, returns 0, res is overwritten, if not found, returns -1. * This function walks the whole tree and not just first level children until @@ -352,7 +352,7 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc, p = NULL; break; } - if ((p->end >= start) && (p->start < end)) + if ((p->end >= start) && (p->start <= end)) break; } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/3] find_next_iomem_res() fixes
These fix: - A kexec off-by-one error that we probably never see in practice (only affects a resource starting at exactly 640KB). - A corner case in walk_iomem_res_desc(), walk_system_ram_res(), etc that we probably also never see in practice (only affects a single byte resource at the very end of the region we're searching) - An issue in walk_iomem_res_desc() that apparently causes a kdump issue (see Lianbo's note at [1]). I think we need to fix the kdump issue either by these patches (specifically the last one) or by the patch Lianbo posted [2]. I'm hoping to avoid deciding and merging these myself, but I'm not sure who really wants to own kernel/resource.c. [1] https://lore.kernel.org/lkml/01551d06-c421-5df3-b19f-fc66f3639...@redhat.com [2] https://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com --- Bjorn Helgaas (3): x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error resource: Include resource end in walk_*() interfaces resource: Fix find_next_iomem_res() iteration issue arch/x86/include/asm/kexec.h |2 - kernel/resource.c| 96 ++ 2 files changed, 43 insertions(+), 55 deletions(-) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/3] x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
From: Bjorn Helgaas The only use of KEXEC_BACKUP_SRC_END is as an argument to walk_system_ram_res(): int crash_load_segments(struct kimage *image) { ... walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END, image, determine_backup_region); walk_system_ram_res() expects "start, end" arguments that are inclusive, i.e., the range to be walked includes both the start and end addresses. KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the first address *past* the desired 0-640KB range. Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC region is [0-0x9], not [0-0xa]. Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call") Signed-off-by: Bjorn Helgaas --- arch/x86/include/asm/kexec.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index f327236f0fa7..5125fca472bb 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -67,7 +67,7 @@ struct kimage; /* Memory to backup during crash kdump */ #define KEXEC_BACKUP_SRC_START (0UL) -#define KEXEC_BACKUP_SRC_END (640 * 1024UL) /* 640K */ +#define KEXEC_BACKUP_SRC_END (640 * 1024UL - 1) /* 640K */ /* * CPU does not save ss and sp on stack if execution is already ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue
On Thu, Sep 27, 2018 at 01:27:41PM +0800, lijiang wrote: > 在 2018年09月25日 06:15, Bjorn Helgaas 写道: > > From: Bjorn Helgaas > > > > Previously find_next_iomem_res() used "*res" as both an input parameter for > > the range to search and the type of resource to search for, and an output > > parameter for the resource we found, which makes the interface confusing > > and hard to use correctly. > > > > All callers allocate a single struct resource and use it for repeated calls > > to find_next_iomem_res(). When find_next_iomem_res() returns a resource, > > it overwrites the start, end, flags, and desc members of the struct. If we > > call find_next_iomem_res() again, we must update or restore these fields. > > > > The callers (__walk_iomem_res_desc() and walk_system_ram_range()) do not > > restore res->flags, so if the caller is searching for flags of > > IORESOURCE_MEM | IORESOURCE_BUSY and finds a resource with flags of > > IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search will > > find only resources marked as IORESOURCE_SYSRAM. > > > > Fix this by restructuring the interface so it takes explicit "start, end, > > flags" parameters and uses "*res" only as an output parameter. > > Hi, Bjorn > I personally suggest that some comments might be added in the code, make it > clear > and easy to understand, then which could avoid the old confusion and more > code changes. Since I think the current interface (using *res as both input and output parameters that have very different meanings) is confusing, it's hard for *me* to write comments that make it less confusing, but of course, you're welcome to propose something. My opinion (probably not universally shared) is that my proposal would make the code more readable, and it's worth doing even though the diff is larger. Anyway, I'll post these patches independently and see if anybody else has an opinion. Bjorn > > Original-patch: > > http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com > > Based-on-patch-by: Lianbo Jiang > > Signed-off-by: Bjorn Helgaas > > --- > > kernel/resource.c | 94 > > +++-- > > 1 file changed, 41 insertions(+), 53 deletions(-) > > > > diff --git a/kernel/resource.c b/kernel/resource.c > > index 155ec873ea4d..9891ea90cc8f 100644 > > --- a/kernel/resource.c > > +++ b/kernel/resource.c > > @@ -319,23 +319,26 @@ int release_resource(struct resource *old) > > EXPORT_SYMBOL(release_resource); > > > > /* > > - * Finds the lowest iomem resource existing within [res->start..res->end]. > > - * The caller must specify res->start, res->end, res->flags, and optionally > > - * desc. If found, returns 0, res is overwritten, if not found, returns > > -1. > > - * This function walks the whole tree and not just first level children > > until > > - * and unless first_level_children_only is true. > > + * Finds the lowest iomem resource that covers part of [start..end]. The > > + * caller must specify start, end, flags, and desc (which may be > > + * IORES_DESC_NONE). > > + * > > + * If a resource is found, returns 0 and *res is overwritten with the part > > + * of the resource that's within [start..end]; if none is found, returns > > + * -1. > > + * > > + * This function walks the whole tree and not just first level children > > + * unless first_level_children_only is true. > > */ > > -static int find_next_iomem_res(struct resource *res, unsigned long desc, > > - bool first_level_children_only) > > +static int find_next_iomem_res(resource_size_t start, resource_size_t end, > > + unsigned long flags, unsigned long desc, > > + bool first_level_children_only, > > + struct resource *res) > > { > > - resource_size_t start, end; > > struct resource *p; > > bool sibling_only = false; > > > > BUG_ON(!res); > > - > > - start = res->start; > > - end = res->end; > > BUG_ON(start >= end); > > > > if (first_level_children_only) > > @@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, > > unsigned long desc, > > read_lock(&resource_lock); > > > > for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) { > > - if ((p->flags & res->flags) != res->flags) > > + if ((p->flags & flags) != flags) > > continue; > > if ((desc != IORES_DESC_NONE) && (desc != p->desc)) > > continue; > > @@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, > > unsigned long desc, > > read_unlock(&resource_lock); > > if (!p) > > return -1; > > + > > /* copy data */ > > - if (res->start < p->start) > > - res->start = p->start; > > - if (res->end > p->end) > > - res->end = p->end; > > + res->start = max(start, p->start); > > + res->end = min(end, p->end); > > res->flags = p
Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory
On Thu, Sep 27, 2018 at 03:19:51PM +0800, Lianbo Jiang wrote: > diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h > index 6de64840dd22..f8795f9581c7 100644 > --- a/arch/x86/include/asm/io.h > +++ b/arch/x86/include/asm/io.h > @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t > offset, unsigned long size); > #define ioremap_cache ioremap_cache > extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long > size, unsigned long prot_val); > #define ioremap_prot ioremap_prot > +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, > + unsigned long size); No need to break this line - see how the other externs don't. > +#define ioremap_encrypted ioremap_encrypted > > /** > * ioremap - map bus memory into CPU space > diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c > index c63a545ec199..e01e6c695add 100644 > --- a/arch/x86/mm/ioremap.c > +++ b/arch/x86/mm/ioremap.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include What is that include for and why is it not up there with the ones? -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] x86/boot: Fix kexec booting failure after SEV early boot support
On 09/27/2018 07:38 AM, Kairui Song wrote: > Commit 1958b5fc4010 ("x86/boot: Add early boot support when running > with SEV active") is causing kexec becomes sometimes unstable even if > SEV is not active. kexec reboot won't start a second kernel bypassing > BIOS boot process, instead, the system got reset. > > That's because, in get_sev_encryption_bit function, we are using > 32-bit RIP-relative addressing to read the value of enc_bit, but > kexec may alloc the early boot up code to a higher location, which > is beyond 32-bit addressing limit. Some garbage will be read and > get_sev_encryption_bit will return the wrong value, which leads to > wrong memory page flag. > > This patch removes the use of enc_bit, as currently, enc_bit's only > purpose is to avoid duplicated encryption bit reading, but the overhead > of reading encryption bit is so tiny, so no need to cache that. > > Fixes: 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV > active") > Suggested-by: Borislav Petkov > Signed-off-by: Kairui Song Reviewed-by: Tom Lendacky > --- > arch/x86/boot/compressed/mem_encrypt.S | 19 --- > 1 file changed, 19 deletions(-) > > diff --git a/arch/x86/boot/compressed/mem_encrypt.S > b/arch/x86/boot/compressed/mem_encrypt.S > index eaa843a52907..a480356e0ed8 100644 > --- a/arch/x86/boot/compressed/mem_encrypt.S > +++ b/arch/x86/boot/compressed/mem_encrypt.S > @@ -25,20 +25,6 @@ ENTRY(get_sev_encryption_bit) > push%ebx > push%ecx > push%edx > - push%edi > - > - /* > - * RIP-relative addressing is needed to access the encryption bit > - * variable. Since we are running in 32-bit mode we need this call/pop > - * sequence to get the proper relative addressing. > - */ > - call1f > -1: popl%edi > - subl$1b, %edi > - > - movlenc_bit(%edi), %eax > - cmpl$0, %eax > - jge .Lsev_exit > > /* Check if running under a hypervisor */ > movl$1, %eax > @@ -69,15 +55,12 @@ ENTRY(get_sev_encryption_bit) > > movl%ebx, %eax > andl$0x3f, %eax /* Return the encryption bit location */ > - movl%eax, enc_bit(%edi) > jmp .Lsev_exit > > .Lno_sev: > xor %eax, %eax > - movl%eax, enc_bit(%edi) > > .Lsev_exit: > - pop %edi > pop %edx > pop %ecx > pop %ebx > @@ -113,8 +96,6 @@ ENTRY(set_sev_encryption_mask) > ENDPROC(set_sev_encryption_mask) > > .data > -enc_bit: > - .int0x > > #ifdef CONFIG_AMD_MEM_ENCRYPT > .balign 8 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] x86/boot: Fix kexec booting failure after SEV early boot support
Commit 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active") is causing kexec becomes sometimes unstable even if SEV is not active. kexec reboot won't start a second kernel bypassing BIOS boot process, instead, the system got reset. That's because, in get_sev_encryption_bit function, we are using 32-bit RIP-relative addressing to read the value of enc_bit, but kexec may alloc the early boot up code to a higher location, which is beyond 32-bit addressing limit. Some garbage will be read and get_sev_encryption_bit will return the wrong value, which leads to wrong memory page flag. This patch removes the use of enc_bit, as currently, enc_bit's only purpose is to avoid duplicated encryption bit reading, but the overhead of reading encryption bit is so tiny, so no need to cache that. Fixes: 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active") Suggested-by: Borislav Petkov Signed-off-by: Kairui Song --- arch/x86/boot/compressed/mem_encrypt.S | 19 --- 1 file changed, 19 deletions(-) diff --git a/arch/x86/boot/compressed/mem_encrypt.S b/arch/x86/boot/compressed/mem_encrypt.S index eaa843a52907..a480356e0ed8 100644 --- a/arch/x86/boot/compressed/mem_encrypt.S +++ b/arch/x86/boot/compressed/mem_encrypt.S @@ -25,20 +25,6 @@ ENTRY(get_sev_encryption_bit) push%ebx push%ecx push%edx - push%edi - - /* -* RIP-relative addressing is needed to access the encryption bit -* variable. Since we are running in 32-bit mode we need this call/pop -* sequence to get the proper relative addressing. -*/ - call1f -1: popl%edi - subl$1b, %edi - - movlenc_bit(%edi), %eax - cmpl$0, %eax - jge .Lsev_exit /* Check if running under a hypervisor */ movl$1, %eax @@ -69,15 +55,12 @@ ENTRY(get_sev_encryption_bit) movl%ebx, %eax andl$0x3f, %eax /* Return the encryption bit location */ - movl%eax, enc_bit(%edi) jmp .Lsev_exit .Lno_sev: xor %eax, %eax - movl%eax, enc_bit(%edi) .Lsev_exit: - pop %edi pop %edx pop %ecx pop %ebx @@ -113,8 +96,6 @@ ENTRY(set_sev_encryption_mask) ENDPROC(set_sev_encryption_mask) .data -enc_bit: - .int0x #ifdef CONFIG_AMD_MEM_ENCRYPT .balign 8 -- 2.17.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 2/6] ima: prevent kexec_load syscall based on runtime secureboot flag
[Cc'ing the kexec mailing list, and Seth] On Wed, 2018-09-26 at 17:52 +0530, Nayna Jain wrote: > When CONFIG_KEXEC_VERIFY_SIG is enabled, the kexec_file_load syscall > requires the kexec'd kernel image to be signed. Distros are concerned > about totally disabling the kexec_load syscall. As a compromise, the > kexec_load syscall will only be disabled when CONFIG_KEXEC_VERIFY_SIG > is configured and the system is booted with secureboot enabled. > > This patch disables the kexec_load syscall only for systems booted with > secureboot enabled. > > Signed-off-by: Nayna Jain Nice! Mimi > --- > security/integrity/ima/ima_main.c | 17 +++-- > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/security/integrity/ima/ima_main.c > b/security/integrity/ima/ima_main.c > index dce0a8a217bb..bdb6e5563d05 100644 > --- a/security/integrity/ima/ima_main.c > +++ b/security/integrity/ima/ima_main.c > @@ -505,20 +505,24 @@ int ima_post_read_file(struct file *file, void *buf, > loff_t size, > */ > int ima_load_data(enum kernel_load_data_id id) > { > - bool sig_enforce; > + bool ima_enforce, sig_enforce; > > - if ((ima_appraise & IMA_APPRAISE_ENFORCE) != IMA_APPRAISE_ENFORCE) > - return 0; > + ima_enforce = > + (ima_appraise & IMA_APPRAISE_ENFORCE) == IMA_APPRAISE_ENFORCE; > > switch (id) { > case LOADING_KEXEC_IMAGE: > - if (ima_appraise & IMA_APPRAISE_KEXEC) { > +#ifdef CONFIG_KEXEC_VERIFY_SIG > + if (arch_ima_get_secureboot()) > + return -EACCES; > +#endif > + if (ima_enforce && (ima_appraise & IMA_APPRAISE_KEXEC)) { > pr_err("impossible to appraise a kernel image without a > file descriptor; try using kexec_file_load syscall.\n"); > return -EACCES; /* INTEGRITY_UNKNOWN */ > } > break; > case LOADING_FIRMWARE: > - if (ima_appraise & IMA_APPRAISE_FIRMWARE) { > + if (ima_enforce && (ima_appraise & IMA_APPRAISE_FIRMWARE)) { > pr_err("Prevent firmware sysfs fallback loading.\n"); > return -EACCES; /* INTEGRITY_UNKNOWN */ > } > @@ -526,7 +530,8 @@ int ima_load_data(enum kernel_load_data_id id) > case LOADING_MODULE: > sig_enforce = is_module_sig_enforced(); > > - if (!sig_enforce && (ima_appraise & IMA_APPRAISE_MODULES)) { > + if (ima_enforce && (!sig_enforce > + && (ima_appraise & IMA_APPRAISE_MODULES))) { > pr_err("impossible to appraise a module without a file > descriptor. sig_enforce kernel parameter might help\n"); > return -EACCES; /* INTEGRITY_UNKNOWN */ > } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v4 1/6] x86/ima: define arch_ima_get_secureboot
[Cc'ing the kexec mailing list, and Seth] On Wed, 2018-09-26 at 17:52 +0530, Nayna Jain wrote: > Distros are concerned about totally disabling the kexec_load syscall. > As a compromise, the kexec_load syscall will only be disabled when > CONFIG_KEXEC_VERIFY_SIG is configured and the system is booted with > secureboot enabled. > > This patch defines the new arch specific function called > arch_ima_get_secureboot() to retrieve the secureboot state of the system. > > Signed-off-by: Nayna Jain > Suggested-by: Seth Forshee Nice! Mimi > --- > arch/x86/kernel/Makefile | 2 ++ > arch/x86/kernel/ima_arch.c | 17 + > include/linux/ima.h| 9 + > 3 files changed, 28 insertions(+) > create mode 100644 arch/x86/kernel/ima_arch.c > > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > index 02d6f5cf4e70..f32406e51424 100644 > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -149,3 +149,5 @@ ifeq ($(CONFIG_X86_64),y) > obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o > obj-y += vsmp_64.o > endif > + > +obj-$(CONFIG_IMA)+= ima_arch.o > diff --git a/arch/x86/kernel/ima_arch.c b/arch/x86/kernel/ima_arch.c > new file mode 100644 > index ..bb5a88d2b271 > --- /dev/null > +++ b/arch/x86/kernel/ima_arch.c > @@ -0,0 +1,17 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * Copyright (C) 2018 IBM Corporation > + */ > +#include > +#include > + > +extern struct boot_params boot_params; > + > +bool arch_ima_get_secureboot(void) > +{ > + if (efi_enabled(EFI_BOOT) && > + (boot_params.secure_boot == efi_secureboot_mode_enabled)) > + return true; > + else > + return false; > +} > diff --git a/include/linux/ima.h b/include/linux/ima.h > index 84806b54b50a..4852255aa4f4 100644 > --- a/include/linux/ima.h > +++ b/include/linux/ima.h > @@ -30,6 +30,15 @@ extern void ima_post_path_mknod(struct dentry *dentry); > extern void ima_add_kexec_buffer(struct kimage *image); > #endif > > +#ifdef CONFIG_X86 > +extern bool arch_ima_get_secureboot(void); > +#else > +static inline bool arch_ima_get_secureboot(void) > +{ > + return false; > +} > +#endif > + > #else > static inline int ima_bprm_check(struct linux_binprm *bprm) > { ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v7 RESEND 4/4] kdump/vmcore: support encrypted old memory with SME enabled
In kdump kernel, we need to dump the old memory into vmcore file,if SME is enabled in the first kernel, we have to remap the old memory with the memory encryption mask, which will be automatically decrypted when we read from DRAM. For SME kdump, there are two cases that doesn't support: -- | first-kernel | second-kernel | kdump support | | (mem_encrypt=on|off)| (yes|no)| |--+---+---| | on | on| yes | | off | off | yes | | on | off | no| | off | on| no| |__|___|___| 1. SME is enabled in the first kernel, but SME is disabled in kdump kernel In this case, because the old memory is encrypted, we can't decrypt the old memory. 2. SME is disabled in the first kernel, but SME is enabled in kdump kernel On the one hand, the old memory is unencrypted, the old memory can be dumped as usual, we don't need to enable SME in kdump kernel; On the other hand, it will increase the complexity of the code, we will have to consider how to pass the SME flag from the first kernel to the kdump kernel, it is really too expensive to do this. This patches are only for SME kdump, the patches don't support SEV kdump. Signed-off-by: Lianbo Jiang Reviewed-by: Tom Lendacky --- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/crash_dump_encrypt.c | 53 fs/proc/vmcore.c | 21 +++ include/linux/crash_dump.h | 12 +++ 4 files changed, 81 insertions(+), 6 deletions(-) create mode 100644 arch/x86/kernel/crash_dump_encrypt.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 8824d01c0c35..dfbeae0e35ce 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -97,6 +97,7 @@ obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o obj-$(CONFIG_KEXEC_FILE) += kexec-bzimage64.o obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o +obj-$(CONFIG_AMD_MEM_ENCRYPT) += crash_dump_encrypt.o obj-y += kprobes/ obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_DOUBLEFAULT) += doublefault.o diff --git a/arch/x86/kernel/crash_dump_encrypt.c b/arch/x86/kernel/crash_dump_encrypt.c new file mode 100644 index ..e1b1a577f197 --- /dev/null +++ b/arch/x86/kernel/crash_dump_encrypt.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Memory preserving reboot related code. + * + * Created by: Lianbo Jiang (liji...@redhat.com) + * Copyright (C) RedHat Corporation, 2018. All rights reserved + */ + +#include +#include +#include +#include + +/** + * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted" + * @pfn: page frame number to be copied + * @buf: target memory address for the copy; this can be in kernel address + * space or user address space (see @userbuf) + * @csize: number of bytes to copy + * @offset: offset in bytes into the page (based on pfn) to begin the copy + * @userbuf: if set, @buf is in user address space, use copy_to_user(), + * otherwise @buf is in kernel address space, use memcpy(). + * + * Copy a page from "oldmem encrypted". For this page, there is no pte + * mapped in the current kernel. We stitch up a pte, similar to + * kmap_atomic. + */ + +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, + size_t csize, unsigned long offset, int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT, + PAGE_SIZE); + if (!vaddr) + return -ENOMEM; + + if (userbuf) { + if (copy_to_user((void __user *)buf, vaddr + offset, csize)) { + iounmap((void __iomem *)vaddr); + return -EFAULT; + } + } else + memcpy(buf, vaddr + offset, csize); + + set_iounmap_nonlazy(); + iounmap((void __iomem *)vaddr); + return csize; +} diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index cbde728f8ac6..3065c8bada6a 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -25,6 +25,9 @@ #include #include #include +#include +#include +#include #include "internal.h" /* List representing chunks of contiguous memory areas and their offsets in @@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn) /* Reads a page from the oldmem device from given offset. */ static ssize_t read_from_oldmem(char *buf, size_t count, - u64 *ppos, int userbuf) + u64 *ppos, int userbuf, + bool encrypted) { unsigned long pfn, offset;
[PATCH v7 RESEND 3/4] iommu/amd: Remap the device table of IOMMU with the memory encryption mask for kdump
In kdump kernel, it will copy the device table of IOMMU from the old device table, which is encrypted when SME is enabled in the first kernel. So we have to remap the old device table with the memory encryption mask. Signed-off-by: Lianbo Jiang Reviewed-by: Tom Lendacky Acked-by: Joerg Roedel --- drivers/iommu/amd_iommu_init.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 84b3e4445d46..3931c7de7c69 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -902,12 +902,22 @@ static bool copy_device_table(void) } } - old_devtb_phys = entry & PAGE_MASK; + /* +* When SME is enabled in the first kernel, the entry includes the +* memory encryption mask(sme_me_mask), we must remove the memory +* encryption mask to obtain the true physical address in kdump kernel. +*/ + old_devtb_phys = __sme_clr(entry) & PAGE_MASK; + if (old_devtb_phys >= 0x1ULL) { pr_err("The address of old device table is above 4G, not trustworthy!\n"); return false; } - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + old_devtb = (sme_active() && is_kdump_kernel()) + ? (__force void *)ioremap_encrypted(old_devtb_phys, + dev_table_size) + : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + if (!old_devtb) return false; -- 2.17.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled
When SME is enabled in the first kernel, we will allocate unencrypted pages for kdump in order to be able to boot the kdump kernel like kexec. Signed-off-by: Lianbo Jiang Reviewed-by: Tom Lendacky --- kernel/kexec_core.c | 12 1 file changed, 12 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 23a83a4da38a..e7efcd1a977b 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image, } } + if (pages) { + /* +* For kdump, we need to ensure that these pages are +* unencrypted pages if SME is enabled. +* By the way, it is unnecessary to call the arch_ +* kexec_pre_free_pages(), which will make the code +* become more simple. +*/ + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0); + } return pages; } @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = -ENOMEM; goto out; } + arch_kexec_post_alloc_pages(page_address(page), 1, 0); ptr = kmap(page); ptr += maddr & ~PAGE_MASK; mchunk = min_t(size_t, mbytes, @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = copy_from_user(ptr, buf, uchunk); kexec_flush_icache_page(page); kunmap(page); + arch_kexec_pre_free_pages(page_address(page), 1); if (result) { result = -EFAULT; goto out; -- 2.17.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory
When SME is enabled on AMD machine, the memory is encrypted in the first kernel. In this case, SME also needs to be enabled in kdump kernel, and we have to remap the old memory with the memory encryption mask. Here we only talk about the case that SME is active in the first kernel, and only care it's active too in kdump kernel. there are four cases we need considered. a. dump vmcore It is encrypted in the first kernel, and needs be read out in kdump kernel. b. crash notes When dumping vmcore, the people usually need to read the useful information from notes, and the notes is also encrypted. c. iommu device table It is allocated by kernel, need fill its pointer into mmio of amd iommu. It's encrypted in the first kernel, need read the old content to analyze and get useful information. d. mmio of amd iommu Register reported by amd firmware, it's not RAM, we don't encrypt in both the first kernel and kdump kernel. To achieve the goal, the solution is: 1. add a new bool parameter "encrypted" to __ioremap_caller() It is a low level function, and check the newly added parameter, if it's true and in kdump kernel, will remap the memory with sme mask. 2. add a new function ioremap_encrypted() to explicitly passed in a "true" value for "encrypted". For above a, b, c, we will call ioremap_encrypted(); 3. adjust all existed ioremap wrapper functions, passed in "false" for encrypted to make them an before. ioremap_encrypted()\ ioremap_cache() | ioremap_prot() | ioremap_wt()|->__ioremap_caller() ioremap_wc()| ioremap_uc()| ioremap_nocache() / Signed-off-by: Lianbo Jiang Reviewed-by: Tom Lendacky --- arch/x86/include/asm/io.h | 3 +++ arch/x86/mm/ioremap.c | 25 + 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 6de64840dd22..f8795f9581c7 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); #define ioremap_cache ioremap_cache extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, unsigned long prot_val); #define ioremap_prot ioremap_prot +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, + unsigned long size); +#define ioremap_encrypted ioremap_encrypted /** * ioremap - map bus memory into CPU space diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index c63a545ec199..e01e6c695add 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "physaddr.h" @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size, * caller shouldn't need to know that small detail. */ static void __iomem *__ioremap_caller(resource_size_t phys_addr, - unsigned long size, enum page_cache_mode pcm, void *caller) + unsigned long size, enum page_cache_mode pcm, + void *caller, bool encrypted) { unsigned long offset, vaddr; resource_size_t last_addr; @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, * resulting mapping. */ prot = PAGE_KERNEL_IO; - if (sev_active() && mem_flags.desc_other) + if ((sev_active() && mem_flags.desc_other) || encrypted) prot = pgprot_encrypted(prot); switch (pcm) { @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS; return __ioremap_caller(phys_addr, size, pcm, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_nocache); @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size) enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC; return __ioremap_caller(phys_addr, size, pcm, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL_GPL(ioremap_uc); @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc); void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_wc); @@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc); void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT, - __builtin_retur
[PATCH v7 RESEND 0/4] Support kdump for AMD secure memory encryption(SME)
When SME is enabled on AMD machine, we also need to support kdump. Because the memory is encrypted in the first kernel, we will remap the old memory to the kdump kernel for dumping data, and SME is also enabled in the kdump kernel, otherwise the old memory can not be decrypted. For the kdump, it is necessary to distinguish whether the memory is encrypted. Furthermore, we should also know which part of the memory is encrypted or decrypted. We will appropriately remap the memory according to the specific situation in order to tell cpu how to access the memory. As we know, a page of memory that is marked as encrypted, which will be automatically decrypted when read from DRAM, and will also be automatically encrypted when written to DRAM. If the old memory is encrypted, we have to remap the old memory with the memory encryption mask, which will automatically decrypt the old memory when we read those data. For kdump(SME), there are two cases that doesn't support: -- | first-kernel | second-kernel | kdump support | | (mem_encrypt=on|off)| (yes|no)| |--+---+---| | on | on| yes | | off | off | yes | | on | off | no| | off | on| no| |__|___|___| 1. SME is enabled in the first kernel, but SME is disabled in kdump kernel In this case, because the old memory is encrypted, we can't decrypt the old memory. 2. SME is disabled in the first kernel, but SME is enabled in kdump kernel It is unnecessary to support in this case, because the old memory is unencrypted, the old memory can be dumped as usual, we don't need to enable SME in kdump kernel. Another, If we must support the scenario, it will increase the complexity of the code, we will have to consider how to pass the SME flag from the first kernel to the kdump kernel, in order to let the kdump kernel know that whether the old memory is encrypted. There are two methods to pass the SME flag to the kdump kernel. The first method is to modify the assembly code, which includes some common code and the path is too long. The second method is to use kexec tool, which could require the SME flag to be exported in the first kernel by "proc" or "sysfs", kexec tools will read the SME flag from "proc" or "sysfs" when we use kexec tools to load image, subsequently the SME flag will be saved in boot_params, we can properly remap the old memory according to the previously saved SME flag. But it is too expensive to do this. This patches are only for SME kdump, the patches don't support SEV kdump. Test tools: makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile commit "A draft for kdump vmcore about AMD SME" Note: This patch can only dump vmcore in the case of SME enabled. crash-7.2.3: https://github.com/crash-utility/crash.git commit <001f77a05585> "Fix for Linux 4.19-rc1 and later kernels that contain kernel commit <7290d5809571>" kexec-tools-2.0.17: git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git commit "kexec: fix for "Unhandled rela relocation: R_X86_64_PLT32" error" Note: Before you load the kernel and initramfs for kdump, this patch( http://lists.infradead.org/pipermail/kexec/2018-September/021460.html) must be merged to kexec-tools, and then the kdump kernel will work well. Because there is a patch which is removed based on v6(x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust encryption mask). Test environment: HP ProLiant DL385Gen10 AMD EPYC 7251 8-Core Processor 32768 MB memory 600 GB disk space Linux 4.19-rc5: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git commit <6bf4ca7fbc85> "Linux 4.19-rc5" Reference: AMD64 Architecture Programmer's Manual https://support.amd.com/TechDocs/24593.pdf Changes since v6: 1. There is a patch which is removed based on v6. (x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust encryption mask) Dave Young suggests that this patch can be removed and fix the kexec-tools. Reference: http://lists.infradead.org/pipermail/kexec/2018-September/021460.html) 2. Update the patch log. Changes since v7: 1. Improve patch log for patch 1/4(Suggested by Baoquan He) 2. Add Reviewed-by for all patches(Tom Lendacky ) 3. Add Acked-by for patch 3/4(Joerg Roedel ) Some known issues: 1. about SME Upstream kernel will hang on HP machine(DL385Gen10 AMD EPYC 7251) when we execute the kexec command as follow: # kexec -l /boot/vmlinuz-4.19.0-rc5+ --initrd=/boot/initramfs-4.19.0-rc5+.img --command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on rd.lvm.lv=rhel_hp-dl385g10-03/root rd.lvm.lv=rhel_hp-dl385g10-03/swap console=ttyS0,115200n81 LANG=en_US.UTF-8 earlyprintk=serial debug nokaslr" # kexec -e (or reboot) But this issue can not be reproduced on speedway machine, and this iss