Re: [PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code
Hi Baoquan, kernel test robot noticed the following build errors: [auto build test ERROR on arm64/for-next/core] [also build test ERROR on tip/x86/core powerpc/next powerpc/fixes v6.5] [cannot apply to linus/master next-20230829] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/crash_core-c-remove-unnecessary-parameter-of-function/20230829-201942 base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core patch link: https://lore.kernel.org/r/20230829121610.138107-7-bhe%40redhat.com patch subject: [PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code config: x86_64-randconfig-r022-20230830 (https://download.01.org/0day-ci/archive/20230830/202308300910.e0i4pijt-...@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230830/202308300910.e0i4pijt-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202308300910.e0i4pijt-...@intel.com/ All errors (new ones prefixed by >>): ld: vmlinux.o: in function `reserve_crashkernel_low': kernel/crash_core.c:369: undefined reference to `crashk_low_res' ld: kernel/crash_core.c:369: undefined reference to `crashk_low_res' ld: kernel/crash_core.c:370: undefined reference to `crashk_low_res' ld: kernel/crash_core.c:369: undefined reference to `crashk_low_res' ld: kernel/crash_core.c:370: undefined reference to `crashk_low_res' ld: vmlinux.o:kernel/crash_core.c:371: more undefined references to `crashk_low_res' follow ld: vmlinux.o: in function `reserve_crashkernel_generic': >> kernel/crash_core.c:453: undefined reference to `crashk_res' >> ld: kernel/crash_core.c:453: undefined reference to `crashk_res' ld: kernel/crash_core.c:454: undefined reference to `crashk_res' >> ld: kernel/crash_core.c:453: undefined reference to `crashk_res' ld: kernel/crash_core.c:454: undefined reference to `crashk_res' ld: vmlinux.o:kernel/crash_core.c:454: more undefined references to `crashk_res' follow vim +453 kernel/crash_core.c 71d2bcec2d4d69 Philipp Rudo 2021-12-24 353 6bee83d29d2e09 Baoquan He 2023-08-29 354 #ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION 6bee83d29d2e09 Baoquan He 2023-08-29 355 static int __init reserve_crashkernel_low(unsigned long long low_size) 6bee83d29d2e09 Baoquan He 2023-08-29 356 { 6bee83d29d2e09 Baoquan He 2023-08-29 357 #ifdef CONFIG_64BIT 6bee83d29d2e09 Baoquan He 2023-08-29 358 unsigned long long low_base; 6bee83d29d2e09 Baoquan He 2023-08-29 359 6bee83d29d2e09 Baoquan He 2023-08-29 360 low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); 6bee83d29d2e09 Baoquan He 2023-08-29 361 if (!low_base) { 6bee83d29d2e09 Baoquan He 2023-08-29 362 pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size); 6bee83d29d2e09 Baoquan He 2023-08-29 363 return -ENOMEM; 6bee83d29d2e09 Baoquan He 2023-08-29 364 } 6bee83d29d2e09 Baoquan He 2023-08-29 365 6bee83d29d2e09 Baoquan He 2023-08-29 366 pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld MB)\n", 6bee83d29d2e09 Baoquan He 2023-08-29 367 low_base, low_base + low_size, low_size >> 20); 6bee83d29d2e09 Baoquan He 2023-08-29 368 6bee83d29d2e09 Baoquan He 2023-08-29 @369 crashk_low_res.start = low_base; 6bee83d29d2e09 Baoquan He 2023-08-29 370 crashk_low_res.end = low_base + low_size - 1; 6bee83d29d2e09 Baoquan He 2023-08-29 371 insert_resource(_resource, _low_res); 6bee83d29d2e09 Baoquan He 2023-08-29 372 #endif 6bee83d29d2e09 Baoquan He 2023-08-29 373 return 0; 6bee83d29d2e09 Baoquan He 2023-08-29 374 } 6bee83d29d2e09 Baoquan He 2023-08-29 375 6bee83d29d2e09 Baoquan He 2023-08-29 376 void __init reserve_crashkernel_generic(char *cmdline, 6bee83d29d2e09 Baoquan He 2023-08-29 377 unsigned long long crash_size, 6bee83d29d2e09 Baoquan He 2023-08-29 378 unsigned long long crash_base, 6bee83d29d2e09 Baoquan He 2023-08-29 379 unsigned long long crash_low_size, 6bee83d29d2e09 Baoquan He 2023-08-29 380 bool high) 6bee83d29d2e09 Baoquan He 2023-08-29 381 { 6bee83d29d2e09 Baoquan He 2023-08-29 382 unsigned long long search_end = CRASH_ADDR_LOW_MAX, search_base = 0; 6bee83d29d2e09 Baoquan He 2023-08-29 383 bool fixed_base =
Re: [RFC] IMA Log Snapshotting Design Proposal
On Tue, Aug 29, 2023 at 5:54 PM Mimi Zohar wrote: > On Tue, 2023-08-29 at 17:30 -0400, Paul Moore wrote: > > On Tue, Aug 29, 2023 at 5:05 PM Mimi Zohar wrote: > > > On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote: > > > > On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar wrote: > > > > > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote: > > > > > > On 8/14/2023 3:02 PM, Mimi Zohar wrote: > > > > > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote: > > > > > > >>> This design seems overly complex and requires synchronization > > > > > > >>> between > > > > > > >>> the "snapshot" record and exporting the records from the > > > > > > >>> measurement > > > > > > >>> list. None of this would be necessary if the measurements were > > > > > > >>> copied > > > > > > >>> from kernel memory to a backing file (e.g. tmpfs), as described > > > > > > >>> in [1]. > > > > > > Even if the Kernel maintains the link between a tmpfs exported and > > > > > > an > > > > > > in-memory IMA log - it still has to copy the tmpfs portion to the > > > > > > Kernel memory during kexec soft boot. tmpfs is cleared during > > > > > > kexec, > > > > > > so this copying of tmpfs back to kernel memory is necessary to > > > > > > preserve > > > > > > the integrity of the log during kexec. But the copying would add > > > > > > back > > > > > > the memory pressure on the node during kexec (which may result in > > > > > > out-of-memory), defeating the purpose of the overall effort/feature. > > > > > > Copying to a regular *persistent* protected file seems a cleaner > > > > > > approach, compared to tmpfs. > > > > > > > > > > From a kernel perspective, it doesn't make a difference if userspace > > > > > provides a tmpfs or persistent file. As per the discussion > > > > > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t > > > > > , userspace provides the kernel with the file descriptor of the opened > > > > > file. > > > > > > > > > > > We prototyped this solution, however it > > > > > > does not seem to be a common pattern within the Kernel to write > > > > > > state > > > > > > directly to files on disk file systems. We considered two potential > > > > > > options: > > > > > > > > > > If no file descriptor is provided, then the measurements aren't copied > > > > > and removed from the securityfs file. If there are write errors, the > > > > > measurements aren't removed from the securityfs file until the write > > > > > errors are resolved. > > > > > > > > It sounds like this approach would require the file/filesystem to be > > > > continuously available for the life of the system once the log was > > > > snapshotted/overflowed to persistent storage, yes? Assuming that is > > > > the case, what happens if the file/filesystem becomes inaccessible at > > > > some point and an attestation client attempts to read the entire log? > > > > > > The main purpose of the change is to addres kernel memory pressure. > > > Two designs are being discussed: Sush's "snapshotting" design and > > > Amir's original suggestion of continously exporting the measurement > > > records to a tmpfs or regular file. Both designs require verifying the > > > initial attestation quote by walking the entire measurement list, > > > calculating the expected TPM PCR value(s). That doesn't change. > > > > Sure, but my question is about what happens if portions of the > > measurement list disappear due to file/filesystem problems? How is > > that handled? > > With the "snapshotting" solution there could be multiple files, so > portions could be missing. The other solution, the preferred solution, > would be one file. With the snapshotting approach the kernel doesn't need to maintain ongoing access to a file, that is left up to the user process performing the attestation (or simply inspecting the logs). I have to ask, for the third time now in as many hours, how does the proposed kernel-holds-an-fd-open solution handle the case where the file/filesystem is no longer accessible? The snapshotting approach should be much more resilient here as the error recovery paths can be much more involved than what we would have available in the kernel, not to mention the flexibility of allowing a user process to determine how to store and manage the snapshotted log. Considering that the snapshotting approach is opt-in (userspace has to initiate the snapshot), I'm not sure the concern over log offsets is a significant objection: existing userspace will never trigger a snapshot, and new userspace that could potentially trigger a snapshot should be written to take that into account. -- paul-moore.com ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC PATCH] Introduce persistent memory pool
On Mon, Aug 28, 2023 at 10:50:19PM +0200, Alexander Graf wrote: > +kexec, iommu, kvm > > On 23.08.23 04:45, Stanislav Kinsburskii wrote: > > > > +akpm, +linux-mm > > > > On Fri, Aug 25, 2023 at 01:32:40PM +, Gowans, James wrote: > > > On Fri, 2023-08-25 at 10:05 +0200, Greg Kroah-Hartman wrote: > > > > > > Thanks for adding me to this thread Greg! > > > > > > > On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote: > > > > > This patch addresses the need for a memory allocator dedicated to > > > > > persistent memory within the kernel. This allocator will preserve > > > > > kernel-specific states like DMA passthrough device states, IOMMU > > > > > state, and > > > > > more across kexec. > > > > > The proposed solution offers a foundational implementation for > > > > > potential > > > > > custom solutions that might follow. Though the implementation is > > > > > intentionally kept concise and straightforward to foster discussion > > > > > and > > > > > feedback, it's fully functional in its current state. > > > Hi Stanislav, it looks like we're working on similar things. I'm looking > > > to develop a mechanism to support hypervisor live update for when KVM is > > > running VMs with PCI device passthrough. VMs with device passthrough > > > also necessitates passing and re-hydrating IOMMU state so that DMA can > > > continue during live update. > > > > > > Planning on having an LPC session on this topic: > > > https://lpc.events/event/17/abstracts/1629/ (currently it's only a > > > submitted abstract so not sure if visible, hopefully it will be soon). > > > > > > We are looking at implementing persistence across kexec via an in-memory > > > filesystem on top of reserved memory. This would have files for anything > > > that needs to be persisted. That includes files for IOMMU pgtables, for > > > guest memory or userspace-accessible memory. > > > > > > It may be nice to solve all kexec persistence requirements with one > > > solution, but we can consider IOMMU separately. There are at least three > > > ways that this can be done: > > > a) carving out reserved memory for pgtables. This is done by your > > > proposal here, as well as my suggestion of a filesystem. > > > b) pre/post kexec hooks for drivers to serialise state and pass it > > > across in a structured format from old to new kernel. > > > c) Reconstructing IOMMU state in the new kernel by starting at the > > > hardware registers and walking the page tables. No state passing needed. > > > > > > Have you considered option (b) and (c) here? One of the implications of > > > (b) and (c) are that they would need to hook into the buddy allocator > > > really early to be able to carve out the reconstructed page tables > > > before the allocator is used. Similar to how pkram [0] hooks in early to > > > carve out pages used for its filesystem. > > > > > Hi James, > > > > We are indeed working on similar things, so thanks for chiming in. > > I've seen pkram proposal as well as your comments there. > > > > I think (b) will need some persistent-over-kexec memory to pass the > > state across kexec as well as some key-value store persisted as well. > > And the proposed persistent memory pool is aimed exactly for this > > purpose. > > Or do you imply some other way to pass driver's data accross kexec? > > > If I had to build this, I'd probably do it just like device tree passing on > ARM. It's a single, physically contiguous blob of data whose entry point you > pass to the target kernel. IIRC ACPI passing works similarly. This would > just be one more opaque data structure that then needs very strict > versioning and forward/backward compat guarantees. > Device tree or ACPI are options indeed. However AFAIU in case of DT user space has to involved into the picture to modify and complie it, while ACPI isn't flexible or easily extendable. Also, AFAIU both these standards were designed with passing hardware-specific data in mind from bootstrap software to an OS kernel and thus were never really intended to be used for creating a persistent state accross kexec. To me, an attempt to use either of them to pass kernel-specific data looks like an abuse (or misuse) excused by the simplicity of implementation. > > > I dind't consider (c) yet, thanks for for the pointer. > > > > I have a question in this scope: how is PCI devices registers state is > > persisted > > across kexec with the files system you are working on? I.e. how does > > driver know, that the device shouldn't not be reinitialized? > > > The easiest way to do it initially would be kernel command line options that > hack up the drivers. But I suppose depending on the option we go with, you > can also use the respective "natural" path: > > (a) A special metadata file that explains the state to the driver > (b) An entry in the structured file format that explains the state to the > target driver > (c) Compatible target drivers try to enumerate state from the target >
Re: [RFC] IMA Log Snapshotting Design Proposal
On Tue, 2023-08-29 at 17:30 -0400, Paul Moore wrote: > On Tue, Aug 29, 2023 at 5:05 PM Mimi Zohar wrote: > > On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote: > > > On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar wrote: > > > > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote: > > > > > On 8/14/2023 3:02 PM, Mimi Zohar wrote: > > > > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote: > > > > > >>> This design seems overly complex and requires synchronization > > > > > >>> between > > > > > >>> the "snapshot" record and exporting the records from the > > > > > >>> measurement > > > > > >>> list. None of this would be necessary if the measurements were > > > > > >>> copied > > > > > >>> from kernel memory to a backing file (e.g. tmpfs), as described > > > > > >>> in [1]. > > > > > Even if the Kernel maintains the link between a tmpfs exported and an > > > > > in-memory IMA log - it still has to copy the tmpfs portion to the > > > > > Kernel memory during kexec soft boot. tmpfs is cleared during kexec, > > > > > so this copying of tmpfs back to kernel memory is necessary to > > > > > preserve > > > > > the integrity of the log during kexec. But the copying would add back > > > > > the memory pressure on the node during kexec (which may result in > > > > > out-of-memory), defeating the purpose of the overall effort/feature. > > > > > Copying to a regular *persistent* protected file seems a cleaner > > > > > approach, compared to tmpfs. > > > > > > > > From a kernel perspective, it doesn't make a difference if userspace > > > > provides a tmpfs or persistent file. As per the discussion > > > > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t > > > > , userspace provides the kernel with the file descriptor of the opened > > > > file. > > > > > > > > > We prototyped this solution, however it > > > > > does not seem to be a common pattern within the Kernel to write state > > > > > directly to files on disk file systems. We considered two potential > > > > > options: > > > > > > > > If no file descriptor is provided, then the measurements aren't copied > > > > and removed from the securityfs file. If there are write errors, the > > > > measurements aren't removed from the securityfs file until the write > > > > errors are resolved. > > > > > > It sounds like this approach would require the file/filesystem to be > > > continuously available for the life of the system once the log was > > > snapshotted/overflowed to persistent storage, yes? Assuming that is > > > the case, what happens if the file/filesystem becomes inaccessible at > > > some point and an attestation client attempts to read the entire log? > > > > The main purpose of the change is to addres kernel memory pressure. > > Two designs are being discussed: Sush's "snapshotting" design and > > Amir's original suggestion of continously exporting the measurement > > records to a tmpfs or regular file. Both designs require verifying the > > initial attestation quote by walking the entire measurement list, > > calculating the expected TPM PCR value(s). That doesn't change. > > Sure, but my question is about what happens if portions of the > measurement list disappear due to file/filesystem problems? How is > that handled? With the "snapshotting" solution there could be multiple files, so portions could be missing. The other solution, the preferred solution, would be one file. Any suggestions? -- thanks, Mimi ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code
Hi Baoquan, kernel test robot noticed the following build errors: [auto build test ERROR on arm64/for-next/core] [also build test ERROR on tip/x86/core powerpc/next powerpc/fixes linus/master v6.5] [cannot apply to next-20230829] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/crash_core-c-remove-unnecessary-parameter-of-function/20230829-201942 base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core patch link: https://lore.kernel.org/r/20230829121610.138107-7-bhe%40redhat.com patch subject: [PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code config: i386-randconfig-r026-20230830 (https://download.01.org/0day-ci/archive/20230830/202308300522.oog0v3u3-...@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230830/202308300522.oog0v3u3-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202308300522.oog0v3u3-...@intel.com/ All errors (new ones prefixed by >>): ld: kernel/crash_core.o: in function `reserve_crashkernel_generic': >> kernel/crash_core.c:450: undefined reference to `crashk_low_res' >> ld: kernel/crash_core.c:451: undefined reference to `crashk_low_res' >> ld: kernel/crash_core.c:455: undefined reference to `crashk_res' ld: kernel/crash_core.c:453: undefined reference to `crashk_res' ld: kernel/crash_core.c:454: undefined reference to `crashk_res' vim +450 kernel/crash_core.c 6bee83d29d2e09 Baoquan He 2023-08-29 375 6bee83d29d2e09 Baoquan He 2023-08-29 376 void __init reserve_crashkernel_generic(char *cmdline, 6bee83d29d2e09 Baoquan He 2023-08-29 377unsigned long long crash_size, 6bee83d29d2e09 Baoquan He 2023-08-29 378unsigned long long crash_base, 6bee83d29d2e09 Baoquan He 2023-08-29 379unsigned long long crash_low_size, 6bee83d29d2e09 Baoquan He 2023-08-29 380bool high) 6bee83d29d2e09 Baoquan He 2023-08-29 381 { 6bee83d29d2e09 Baoquan He 2023-08-29 382 unsigned long long search_end = CRASH_ADDR_LOW_MAX, search_base = 0; 6bee83d29d2e09 Baoquan He 2023-08-29 383 bool fixed_base = false; 6bee83d29d2e09 Baoquan He 2023-08-29 384 6bee83d29d2e09 Baoquan He 2023-08-29 385 /* User specifies base address explicitly. */ 6bee83d29d2e09 Baoquan He 2023-08-29 386 if (crash_base) { 6bee83d29d2e09 Baoquan He 2023-08-29 387 fixed_base = true; 6bee83d29d2e09 Baoquan He 2023-08-29 388 search_base = crash_base; 6bee83d29d2e09 Baoquan He 2023-08-29 389 search_end = crash_base + crash_size; 6bee83d29d2e09 Baoquan He 2023-08-29 390 } 6bee83d29d2e09 Baoquan He 2023-08-29 391 6bee83d29d2e09 Baoquan He 2023-08-29 392 if (high) { 6bee83d29d2e09 Baoquan He 2023-08-29 393 search_base = CRASH_ADDR_LOW_MAX; 6bee83d29d2e09 Baoquan He 2023-08-29 394 search_end = CRASH_ADDR_HIGH_MAX; 6bee83d29d2e09 Baoquan He 2023-08-29 395 } 6bee83d29d2e09 Baoquan He 2023-08-29 396 6bee83d29d2e09 Baoquan He 2023-08-29 397 retry: 6bee83d29d2e09 Baoquan He 2023-08-29 398 crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, 6bee83d29d2e09 Baoquan He 2023-08-29 399 search_base, search_end); 6bee83d29d2e09 Baoquan He 2023-08-29 400 if (!crash_base) { 6bee83d29d2e09 Baoquan He 2023-08-29 401 /* 6bee83d29d2e09 Baoquan He 2023-08-29 402* For crashkernel=size[KMG]@offset[KMG], print out failure 6bee83d29d2e09 Baoquan He 2023-08-29 403* message if can't reserve the specified region. 6bee83d29d2e09 Baoquan He 2023-08-29 404*/ 6bee83d29d2e09 Baoquan He 2023-08-29 405 if (fixed_base) { 6bee83d29d2e09 Baoquan He 2023-08-29 406 pr_warn("crashkernel reservation failed - memory is in use.\n"); 6bee83d29d2e09 Baoquan He 2023-08-29 407 return; 6bee83d29d2e09 Baoquan He 2023-08-29 408 } 6bee83d29d2e09 Baoquan He 2023-08-29 409 6bee83d29d2e09 Baoquan He 2023-08-29 410 /* 6bee83d29d2e09 Baoquan He 2023-08-29 411* For crashkernel=size[KMG], if the first attempt was for 6bee83d29d2e09 Baoquan He 2023-08-29 412* low memory, fall back to high memory, the minimum required 6bee83d29d2e09 Baoquan He 2023-08-2
Re: [RFC] IMA Log Snapshotting Design Proposal
On Tue, Aug 29, 2023 at 5:05 PM Mimi Zohar wrote: > On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote: > > On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar wrote: > > > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote: > > > > On 8/14/2023 3:02 PM, Mimi Zohar wrote: > > > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote: > > > > >>> This design seems overly complex and requires synchronization > > > > >>> between > > > > >>> the "snapshot" record and exporting the records from the measurement > > > > >>> list. None of this would be necessary if the measurements were > > > > >>> copied > > > > >>> from kernel memory to a backing file (e.g. tmpfs), as described in > > > > >>> [1]. > > > > Even if the Kernel maintains the link between a tmpfs exported and an > > > > in-memory IMA log - it still has to copy the tmpfs portion to the > > > > Kernel memory during kexec soft boot. tmpfs is cleared during kexec, > > > > so this copying of tmpfs back to kernel memory is necessary to preserve > > > > the integrity of the log during kexec. But the copying would add back > > > > the memory pressure on the node during kexec (which may result in > > > > out-of-memory), defeating the purpose of the overall effort/feature. > > > > Copying to a regular *persistent* protected file seems a cleaner > > > > approach, compared to tmpfs. > > > > > > From a kernel perspective, it doesn't make a difference if userspace > > > provides a tmpfs or persistent file. As per the discussion > > > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t > > > , userspace provides the kernel with the file descriptor of the opened > > > file. > > > > > > > We prototyped this solution, however it > > > > does not seem to be a common pattern within the Kernel to write state > > > > directly to files on disk file systems. We considered two potential > > > > options: > > > > > > If no file descriptor is provided, then the measurements aren't copied > > > and removed from the securityfs file. If there are write errors, the > > > measurements aren't removed from the securityfs file until the write > > > errors are resolved. > > > > It sounds like this approach would require the file/filesystem to be > > continuously available for the life of the system once the log was > > snapshotted/overflowed to persistent storage, yes? Assuming that is > > the case, what happens if the file/filesystem becomes inaccessible at > > some point and an attestation client attempts to read the entire log? > > The main purpose of the change is to addres kernel memory pressure. > Two designs are being discussed: Sush's "snapshotting" design and > Amir's original suggestion of continously exporting the measurement > records to a tmpfs or regular file. Both designs require verifying the > initial attestation quote by walking the entire measurement list, > calculating the expected TPM PCR value(s). That doesn't change. Sure, but my question is about what happens if portions of the measurement list disappear due to file/filesystem problems? How is that handled? -- paul-moore.com ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote: > On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar wrote: > > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote: > > > On 8/14/2023 3:02 PM, Mimi Zohar wrote: > > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote: > > > >>> This design seems overly complex and requires synchronization between > > > >>> the "snapshot" record and exporting the records from the measurement > > > >>> list. None of this would be necessary if the measurements were copied > > > >>> from kernel memory to a backing file (e.g. tmpfs), as described in > > > >>> [1]. > > > Even if the Kernel maintains the link between a tmpfs exported and an > > > in-memory IMA log - it still has to copy the tmpfs portion to the > > > Kernel memory during kexec soft boot. tmpfs is cleared during kexec, > > > so this copying of tmpfs back to kernel memory is necessary to preserve > > > the integrity of the log during kexec. But the copying would add back > > > the memory pressure on the node during kexec (which may result in > > > out-of-memory), defeating the purpose of the overall effort/feature. > > > Copying to a regular *persistent* protected file seems a cleaner > > > approach, compared to tmpfs. > > > > From a kernel perspective, it doesn't make a difference if userspace > > provides a tmpfs or persistent file. As per the discussion > > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t > > , userspace provides the kernel with the file descriptor of the opened > > file. > > > > > We prototyped this solution, however it > > > does not seem to be a common pattern within the Kernel to write state > > > directly to files on disk file systems. We considered two potential > > > options: > > > > If no file descriptor is provided, then the measurements aren't copied > > and removed from the securityfs file. If there are write errors, the > > measurements aren't removed from the securityfs file until the write > > errors are resolved. > > It sounds like this approach would require the file/filesystem to be > continuously available for the life of the system once the log was > snapshotted/overflowed to persistent storage, yes? Assuming that is > the case, what happens if the file/filesystem becomes inaccessible at > some point and an attestation client attempts to read the entire log? The main purpose of the change is to addres kernel memory pressure. Two designs are being discussed: Sush's "snapshotting" design and Amir's original suggestion of continously exporting the measurement records to a tmpfs or regular file. Both designs require verifying the initial attestation quote by walking the entire measurement list, calculating the expected TPM PCR value(s). That doesn't change. -- thanks, Mimi ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC] IMA Log Snapshotting Design Proposal
On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar wrote: > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote: > > On 8/14/2023 3:02 PM, Mimi Zohar wrote: > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote: > > >>> This design seems overly complex and requires synchronization between > > >>> the "snapshot" record and exporting the records from the measurement > > >>> list. None of this would be necessary if the measurements were copied > > >>> from kernel memory to a backing file (e.g. tmpfs), as described in [1]. > > Even if the Kernel maintains the link between a tmpfs exported and an > > in-memory IMA log - it still has to copy the tmpfs portion to the > > Kernel memory during kexec soft boot. tmpfs is cleared during kexec, > > so this copying of tmpfs back to kernel memory is necessary to preserve > > the integrity of the log during kexec. But the copying would add back > > the memory pressure on the node during kexec (which may result in > > out-of-memory), defeating the purpose of the overall effort/feature. > > Copying to a regular *persistent* protected file seems a cleaner > > approach, compared to tmpfs. > > From a kernel perspective, it doesn't make a difference if userspace > provides a tmpfs or persistent file. As per the discussion > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t > , userspace provides the kernel with the file descriptor of the opened > file. > > > We prototyped this solution, however it > > does not seem to be a common pattern within the Kernel to write state > > directly to files on disk file systems. We considered two potential > > options: > > If no file descriptor is provided, then the measurements aren't copied > and removed from the securityfs file. If there are write errors, the > measurements aren't removed from the securityfs file until the write > errors are resolved. It sounds like this approach would require the file/filesystem to be continuously available for the life of the system once the log was snapshotted/overflowed to persistent storage, yes? Assuming that is the case, what happens if the file/filesystem becomes inaccessible at some point and an attestation client attempts to read the entire log? -- paul-moore.com ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v2 4/8] crash_core: add generic function to do reservation
In architecture like x86_64, arm64 and riscv, they have vast virtual address space and usually have huge physical memory RAM. Their crashkernel reservation doesn't have to be limited under 4G RAM, but can be extended to the whole physical memory via crashkernel=,high support. Now add function reserve_crashkernel_generic() to reserve crashkernel memory if users specify any case of kernel pamameters, like crashkernel=xM[@offset] or crashkernel=,high|low. This is preparation to simplify code of crashkernel=,high support in architecutures. Signed-off-by: Baoquan He --- include/linux/crash_core.h | 34 ++-- kernel/crash_core.c| 109 - 2 files changed, 136 insertions(+), 7 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 85260bf4a734..2f732493e922 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -77,12 +77,6 @@ Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len); void final_note(Elf_Word *buf); -#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION -#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE -#define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) -#endif -#endif - int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base, unsigned long long *low_size, bool *high); @@ -91,4 +85,32 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION +#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE +#define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) +#endif +#ifndef CRASH_ALIGN +#define CRASH_ALIGNSZ_2M +#endif +#ifndef CRASH_ADDR_LOW_MAX +#define CRASH_ADDR_LOW_MAX SZ_4G +#endif +#ifndef CRASH_ADDR_HIGH_MAX +#define CRASH_ADDR_HIGH_MAXmemblock_end_of_DRAM() +#endif + +void __init reserve_crashkernel_generic(char *cmdline, + unsigned long long crash_size, + unsigned long long crash_base, + unsigned long long crash_low_size, + bool high); +#else +static inline void __init reserve_crashkernel_generic(char *cmdline, + unsigned long long crash_size, + unsigned long long crash_base, + unsigned long long crash_low_size, + bool high) +{} +#endif + #endif /* LINUX_CRASH_CORE_H */ diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 355b0ab5189c..6bc00cc390b5 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -5,11 +5,13 @@ */ #include -#include #include #include #include #include +#include +#include +#include #include #include @@ -349,6 +351,111 @@ static int __init parse_crashkernel_dummy(char *arg) } early_param("crashkernel", parse_crashkernel_dummy); +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION +static int __init reserve_crashkernel_low(unsigned long long low_size) +{ +#ifdef CONFIG_64BIT + unsigned long long low_base; + + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); + if (!low_base) { + pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size); + return -ENOMEM; + } + + pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld MB)\n", + low_base, low_base + low_size, low_size >> 20); + + crashk_low_res.start = low_base; + crashk_low_res.end = low_base + low_size - 1; + insert_resource(_resource, _low_res); +#endif + return 0; +} + +void __init reserve_crashkernel_generic(char *cmdline, +unsigned long long crash_size, +unsigned long long crash_base, +unsigned long long crash_low_size, +bool high) +{ + unsigned long long search_end = CRASH_ADDR_LOW_MAX, search_base = 0; + bool fixed_base = false; + + /* User specifies base address explicitly. */ + if (crash_base) { + fixed_base = true; + search_base = crash_base; + search_end = crash_base + crash_size; + } + + if (high) { + search_base = CRASH_ADDR_LOW_MAX; + search_end = CRASH_ADDR_HIGH_MAX; + } + +retry: + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, + search_base, search_end); + if (!crash_base) { + /* +* For crashkernel=size[KMG]@offset[KMG], print out failure +* message if can't reserve the specified region. +*/ +
[PATCH v2 7/8] arm64: kdump: use generic interface to simplify crashkernel reservation
With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservation can be simplified by steps: 1) Add a new header file , and define CRASH_ALIGN, CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and DEFAULT_CRASH_KERNEL_LOW_SIZE in ; 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and reserve_crashkernel_generic(); 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in arch/arm64/Kconfig. The old reserve_crashkernel_low() and reserve_crashkernel() can be removed. Signed-off-by: Baoquan He --- arch/arm64/Kconfig | 3 + arch/arm64/include/asm/crash_core.h | 10 ++ arch/arm64/mm/init.c| 140 ++-- 3 files changed, 21 insertions(+), 132 deletions(-) create mode 100644 arch/arm64/include/asm/crash_core.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 29db061db9bb..07fb8c71339d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1481,6 +1481,9 @@ config KEXEC_FILE for kernel and initramfs as opposed to list of segments as accepted by previous system call. +config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION + def_bool CRASH_CORE + config KEXEC_SIG bool "Verify kernel signature during kexec_file_load() syscall" depends on KEXEC_FILE diff --git a/arch/arm64/include/asm/crash_core.h b/arch/arm64/include/asm/crash_core.h new file mode 100644 index ..9f5c8d339f44 --- /dev/null +++ b/arch/arm64/include/asm/crash_core.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _ARM64_CRASH_CORE_H +#define _ARM64_CRASH_CORE_H + +/* Current arm64 boot protocol requires 2MB alignment */ +#define CRASH_ALIGN SZ_2M + +#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit +#define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1) +#endif diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 4ad637508b75..48ab23531bb6 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -64,15 +64,6 @@ EXPORT_SYMBOL(memstart_addr); */ phys_addr_t __ro_after_init arm64_dma_phys_limit; -/* Current arm64 boot protocol requires 2MB alignment */ -#define CRASH_ALIGNSZ_2M - -#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit -#define CRASH_ADDR_HIGH_MAX(PHYS_MASK + 1) -#define CRASH_HIGH_SEARCH_BASE SZ_4G - -#define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) - /* * To make optimal use of block mappings when laying out the linear * mapping, round down the base of physical memory to a size that can @@ -100,140 +91,25 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit; #define ARM64_MEMSTART_ALIGN (1UL << ARM64_MEMSTART_SHIFT) #endif -static int __init reserve_crashkernel_low(unsigned long long low_size) -{ - unsigned long long low_base; - - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX); - if (!low_base) { - pr_err("cannot allocate crashkernel low memory (size:0x%llx).\n", low_size); - return -ENOMEM; - } - - pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld MB)\n", - low_base, low_base + low_size, low_size >> 20); - - crashk_low_res.start = low_base; - crashk_low_res.end = low_base + low_size - 1; - insert_resource(_resource, _low_res); - - return 0; -} - -/* - * reserve_crashkernel() - reserves memory for crash kernel - * - * This function reserves memory area given in "crashkernel=" kernel command - * line parameter. The memory reserved is used by dump capture kernel when - * primary kernel is crashing. - */ -static void __init reserve_crashkernel(void) +static void __init arch_reserve_crashkernel(void) { - unsigned long long crash_low_size = 0, search_base = 0; - unsigned long long crash_max = CRASH_ADDR_LOW_MAX; + unsigned long long low_size = 0; unsigned long long crash_base, crash_size; char *cmdline = boot_command_line; - bool fixed_base = false; bool high = false; int ret; if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; - /* crashkernel=X[@offset] */ ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), - _size, _base, NULL, NULL); - if (ret == -ENOENT) { - ret = parse_crashkernel_high(cmdline, 0, _size, _base); - if (ret || !crash_size) - return; - - /* -* crashkernel=Y,low can be specified or not, but invalid value -* is not allowed. -*/ - ret = parse_crashkernel_low(cmdline, 0, _low_size, _base); - if (ret == -ENOENT) - crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; - else if (ret) - return; - -
[PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code
With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservation can be simplified by steps: 1) Add a new header file , and define CRASH_ALIGN, CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and DEFAULT_CRASH_KERNEL_LOW_SIZE in ; 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and reserve_crashkernel_generic(), and do the ARCH specific work if needed. 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in arch/x86/Kconfig. When adding DEFAULT_CRASH_KERNEL_LOW_SIZE, add crash_low_size_default() to calculate crashkernel low memory because x86_64 has special requirement. The old reserve_crashkernel_low() and reserve_crashkernel() can be removed. Signed-off-by: Baoquan He --- arch/x86/Kconfig | 3 + arch/x86/include/asm/crash_core.h | 34 +++ arch/x86/kernel/setup.c | 144 -- 3 files changed, 53 insertions(+), 128 deletions(-) create mode 100644 arch/x86/include/asm/crash_core.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8d9e4b362572..c4539dc35985 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2037,6 +2037,9 @@ config KEXEC_FILE config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE +config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION + def_bool CRASH_CORE + config KEXEC_SIG bool "Verify kernel signature during kexec_file_load() syscall" depends on KEXEC_FILE diff --git a/arch/x86/include/asm/crash_core.h b/arch/x86/include/asm/crash_core.h new file mode 100644 index ..5fc5e4f94521 --- /dev/null +++ b/arch/x86/include/asm/crash_core.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _X86_CRASH_CORE_H +#define _X86_CRASH_CORE_H + +/* 16M alignment for crash kernel regions */ +#define CRASH_ALIGN SZ_16M + +/* + * Keep the crash kernel below this limit. + * + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range + * due to mapping restrictions. + * + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is + * the upper limit of system RAM in 4-level paging mode. Since the kdump + * jump could be from 5-level paging to 4-level paging, the jump will fail if + * the kernel is put above 64 TB, and during the 1st kernel bootup there's + * no good way to detect the paging mode of the target kernel which will be + * loaded for dumping. + */ + +#ifdef CONFIG_X86_32 +# define CRASH_ADDR_LOW_MAX SZ_512M +# define CRASH_ADDR_HIGH_MAXSZ_512M +#else +# define CRASH_ADDR_LOW_MAX SZ_4G +# define CRASH_ADDR_HIGH_MAXSZ_64T +#endif + +# define DEFAULT_CRASH_KERNEL_LOW_SIZE crash_low_size_default() + +extern unsigned long crash_low_size_default(void); + +#endif /* _X86_CRASH_CORE_H */ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 382c66d2cf71..559a5c4141db 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -474,152 +474,40 @@ static void __init memblock_x86_reserve_range_setup_data(void) /* * - Crashkernel reservation -- */ - -/* 16M alignment for crash kernel regions */ -#define CRASH_ALIGNSZ_16M - -/* - * Keep the crash kernel below this limit. - * - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range - * due to mapping restrictions. - * - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is - * the upper limit of system RAM in 4-level paging mode. Since the kdump - * jump could be from 5-level paging to 4-level paging, the jump will fail if - * the kernel is put above 64 TB, and during the 1st kernel bootup there's - * no good way to detect the paging mode of the target kernel which will be - * loaded for dumping. - */ -#ifdef CONFIG_X86_32 -# define CRASH_ADDR_LOW_MAXSZ_512M -# define CRASH_ADDR_HIGH_MAX SZ_512M -#else -# define CRASH_ADDR_LOW_MAXSZ_4G -# define CRASH_ADDR_HIGH_MAX SZ_64T -#endif - -static int __init reserve_crashkernel_low(void) +unsigned long crash_low_size_default(void) { #ifdef CONFIG_X86_64 - unsigned long long base, low_base = 0, low_size = 0; - unsigned long low_mem_limit; - int ret; - - low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX); - - /* crashkernel=Y,low */ - ret = parse_crashkernel_low(boot_command_line, low_mem_limit, _size, ); - if (ret) { - /* -* two parts from kernel/dma/swiotlb.c: -* -swiotlb size: user-specified with swiotlb= or default. -* -* -swiotlb overflow buffer: now hardcoded to 32k. We round it -* to 8M for other buffers that may need to stay low too. Also -* make sure we allocate enough extra low memory so that we -* don't run out of DMA buffers for 32-bit devices. -*/ - low_size = max(swiotlb_size_or_default() + (8UL <<
[PATCH v2 8/8] crash_core.c: remove unneeded functions
So far, nobody calls functions parse_crashkernel_high() and parse_crashkernel_low(), remove both of them. Signed-off-by: Baoquan He --- include/linux/crash_core.h | 4 kernel/crash_core.c| 18 -- 2 files changed, 22 deletions(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 86e22e6a039f..d64006c4bd43 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -83,10 +83,6 @@ void final_note(Elf_Word *buf); int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base, unsigned long long *low_size, bool *high); -int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); -int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, - unsigned long long *crash_size, unsigned long long *crash_base); #ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION #ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 6bc00cc390b5..61a8ea3b23a2 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -323,24 +323,6 @@ int __init parse_crashkernel(char *cmdline, return 0; } -int __init parse_crashkernel_high(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) -{ - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - suffix_tbl[SUFFIX_HIGH]); -} - -int __init parse_crashkernel_low(char *cmdline, -unsigned long long system_ram, -unsigned long long *crash_size, -unsigned long long *crash_base) -{ - return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - suffix_tbl[SUFFIX_LOW]); -} - /* * Add a dummy early_param handler to mark crashkernel= as a known command line * parameter and suppress incorrect warnings in init/main.c. -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v2 5/8] crash_core.h: include if generic reservation is needed
In asm/crash_core.h, ARCH can provide its own macro definitions to override DEFAULT_CRASH_KERNEL_LOW_SIZE, CRASH_ALIGN, CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX in if needed. Later, x86 and arm64 wil support the generic reservaton, so wrap the including into CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope to avoid compiling error in other ARCH-es which doesn't take the generic reservation. Signed-off-by: Baoquan He --- include/linux/crash_core.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 2f732493e922..86e22e6a039f 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -5,6 +5,9 @@ #include #include #include +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION +#include +#endif #define CRASH_CORE_NOTE_NAME "CORE" #define CRASH_CORE_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4) -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v2 2/8] crash_core: change the prototype of function parse_crashkernel()
Add two parameters 'low_size' and 'high' to function parse_crashkernel(), later crashkernel=,high|low parsing will be added. Make adjustments in all call sites of parse_crashkernel() in arch. Signed-off-by: Baoquan He --- arch/arm/kernel/setup.c | 3 ++- arch/arm64/mm/init.c | 2 +- arch/ia64/kernel/setup.c | 2 +- arch/loongarch/kernel/setup.c| 4 +++- arch/mips/kernel/setup.c | 3 ++- arch/powerpc/kernel/fadump.c | 2 +- arch/powerpc/kexec/core.c| 2 +- arch/powerpc/mm/nohash/kaslr_booke.c | 2 +- arch/riscv/mm/init.c | 2 +- arch/s390/kernel/setup.c | 4 ++-- arch/sh/kernel/machine_kexec.c | 2 +- arch/x86/kernel/setup.c | 3 ++- include/linux/crash_core.h | 3 ++- kernel/crash_core.c | 15 --- 14 files changed, 32 insertions(+), 17 deletions(-) diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index c66b560562b3..e2bb7afd0683 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -1010,7 +1010,8 @@ static void __init reserve_crashkernel(void) total_mem = get_total_mem(); ret = parse_crashkernel(boot_command_line, total_mem, - _size, _base); + _size, _base, + NULL, NULL); /* invalid value specified or crashkernel=0 */ if (ret || !crash_size) return; diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 4fcb88a445ef..4ad637508b75 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -142,7 +142,7 @@ static void __init reserve_crashkernel(void) /* crashkernel=X[@offset] */ ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), - _size, _base); + _size, _base, NULL, NULL); if (ret == -ENOENT) { ret = parse_crashkernel_high(cmdline, 0, _size, _base); if (ret || !crash_size) diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c index 5a55ac82c13a..4faea2d2cf07 100644 --- a/arch/ia64/kernel/setup.c +++ b/arch/ia64/kernel/setup.c @@ -277,7 +277,7 @@ static void __init setup_crashkernel(unsigned long total, int *n) int ret; ret = parse_crashkernel(boot_command_line, total, - , ); + , , NULL, NULL); if (ret == 0 && size > 0) { if (!base) { sort_regions(rsvd_region, *n); diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c index 9d830ab4e302..776a068d8718 100644 --- a/arch/loongarch/kernel/setup.c +++ b/arch/loongarch/kernel/setup.c @@ -267,7 +267,9 @@ static void __init arch_parse_crashkernel(void) unsigned long long crash_base, crash_size; total_mem = memblock_phys_mem_size(); - ret = parse_crashkernel(boot_command_line, total_mem, _size, _base); + ret = parse_crashkernel(boot_command_line, total_mem, + _size, _base, + NULL, NULL); if (ret < 0 || crash_size <= 0) return; diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index cb871eb784a7..08321c945ac4 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -460,7 +460,8 @@ static void __init mips_parse_crashkernel(void) total_mem = memblock_phys_mem_size(); ret = parse_crashkernel(boot_command_line, total_mem, - _size, _base); + _size, _base, + NULL, NULL); if (ret != 0 || crash_size <= 0) return; diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index ea0a073abd96..7dbdeba56e74 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -313,7 +313,7 @@ static __init u64 fadump_calculate_reserve_size(void) * memory at a predefined offset. */ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), - , ); + , , NULL, NULL); if (ret == 0 && size > 0) { unsigned long max_size; diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index de64c7962991..9346c960b296 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -109,7 +109,7 @@ void __init reserve_crashkernel(void) total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); /* use common parsing */ ret = parse_crashkernel(boot_command_line, total_mem_sz, - _size, _base); + _size, _base, NULL, NULL); if (ret == 0 && crash_size > 0) { crashk_res.start = crash_base; crashk_res.end = crash_base +
[PATCH v2 0/8] kdump: use generic functions to simplify crashkernel reservation in arch
In the current arm64, crashkernel=,high support has been finished after several rounds of posting and careful reviewing. The code in arm64 which parses crashkernel kernel parameters firstly, then reserve memory can be a good example for other ARCH to refer to. Whereas in x86_64, the code mixing crashkernel parameter parsing and memory reserving is twisted, and looks messy. Refactoring the code to make it more readable maintainable is necessary. Here, firstly abstract the crashkernel parameter parsing code into parse_crashkernel() to make it be able to parse crashkernel=,high|low. Then abstract the crashkernel memory reserving code into a generic function reserve_crashkernel_generic(). Finally, in ARCH which crashkernel=,high support is needed, a simple arch_reserve_crashkernel() can be added to call above two functions. This can remove the duplicated implmentation code in each ARCH, like arm64, x86_64. I only change the arm64 and x86_64 implementation to make use of the generic functions to simplify code. Risc-v can be done very easily refer to the steps in arm64 and x86_64. History: - RFC patchset: https://lore.kernel.org/all/20230619055951.45620-1-...@redhat.com/T/#u [RFC PATCH 0/4] kdump: add generic functions to simplify crashkernel crashkernel in architecture Dave and Philipp commented the old parse_crashkernel_common() and parse_crashkernel_generic() are quite confusing. In this formal post, I made change to address the concern by unifying all crashkernel parsing into parse_crashkernel(). v1->v2: - Change to add asm/crash_core.h into x86 and arm64 to contain those arch specific macro definitions for generic reservaton. This fixes the compiling error reported by LKP in v1. https://lore.kernel.org/all/202308272150.p3krkmof-...@intel.com/T/#u - Fix a log typo in patch 8, thanks to Samuel. Baoquan He (8): crash_core.c: remove unnecessary parameter of function crash_core: change the prototype of function parse_crashkernel() crash_core: change parse_crashkernel() to support crashkernel=,high|low parsing crash_core: add generic function to do reservation crash_core.h: include if generic reservation is needed x86: kdump: use generic interface to simplify crashkernel reservation code arm64: kdump: use generic interface to simplify crashkernel reservation crash_core.c: remove unneeded functions arch/arm/kernel/setup.c | 3 +- arch/arm64/Kconfig | 3 + arch/arm64/include/asm/crash_core.h | 10 ++ arch/arm64/mm/init.c | 140 ++ arch/ia64/kernel/setup.c | 2 +- arch/loongarch/kernel/setup.c| 4 +- arch/mips/kernel/setup.c | 3 +- arch/powerpc/kernel/fadump.c | 2 +- arch/powerpc/kexec/core.c| 2 +- arch/powerpc/mm/nohash/kaslr_booke.c | 2 +- arch/riscv/mm/init.c | 2 +- arch/s390/kernel/setup.c | 4 +- arch/sh/kernel/machine_kexec.c | 2 +- arch/x86/Kconfig | 3 + arch/x86/include/asm/crash_core.h| 34 ++ arch/x86/kernel/setup.c | 143 +++--- include/linux/crash_core.h | 38 +- kernel/crash_core.c | 170 +++ 18 files changed, 269 insertions(+), 298 deletions(-) create mode 100644 arch/arm64/include/asm/crash_core.h create mode 100644 arch/x86/include/asm/crash_core.h -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v2 3/8] crash_core: change parse_crashkernel() to support crashkernel=,high|low parsing
Now parse_crashkernel() is a real entry point for all kinds of crahskernel parsing on any architecture. And wrap the crahskernel=,high|low handling inside CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope. Signed-off-by: Baoquan He --- include/linux/crash_core.h | 6 ++ kernel/crash_core.c| 28 +++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 2e76289699ff..85260bf4a734 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -77,6 +77,12 @@ Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len); void final_note(Elf_Word *buf); +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION +#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE +#define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) +#endif +#endif + int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base, unsigned long long *low_size, bool *high); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index f6a5c219e2e1..355b0ab5189c 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -276,6 +276,9 @@ static int __init __parse_crashkernel(char *cmdline, /* * That function is the entry point for command line parsing and should be * called from the arch-specific code. + * + * If crashkernel=,high|low is supported on architecture, non-NULL values + * should be passed to parameters 'low_size' and 'high'. */ int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, @@ -291,7 +294,30 @@ int __init parse_crashkernel(char *cmdline, crash_base, NULL); if (!high) return ret; - +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION + else if (ret == -ENOENT) { + ret = __parse_crashkernel(cmdline, 0, crash_size, + crash_base, suffix_tbl[SUFFIX_HIGH]); + if (ret || !*crash_size) + return -1; + + /* +* crashkernel=Y,low can be specified or not, but invalid value +* is not allowed. +*/ + ret = __parse_crashkernel(cmdline, 0, low_size, + crash_base, suffix_tbl[SUFFIX_LOW]); + if (ret == -ENOENT) + *low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE; + else if (ret) + return -1; + + *high = true; + } else if (ret || !*crash_size) { + /* The specified value is invalid */ + return -1; + } +#endif return 0; } -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v2 1/8] crash_core.c: remove unnecessary parameter of function
In all call sites of __parse_crashkernel(), the parameter 'name' is hardcoded as "crashkernel=". So remove the unnecessary parameter 'name', add local varibale 'name' inside __parse_crashkernel() instead. Signed-off-by: Baoquan He --- kernel/crash_core.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 90ce1dfd591c..f27b4e45d410 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -241,11 +241,11 @@ static int __init __parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base, -const char *name, const char *suffix) { char*first_colon, *first_space; char*ck_cmdline; + char*name = "crashkernel="; BUG_ON(!crash_size || !crash_base); *crash_size = 0; @@ -283,7 +283,7 @@ int __init parse_crashkernel(char *cmdline, unsigned long long *crash_base) { return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", NULL); + NULL); } int __init parse_crashkernel_high(char *cmdline, @@ -292,7 +292,7 @@ int __init parse_crashkernel_high(char *cmdline, unsigned long long *crash_base) { return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", suffix_tbl[SUFFIX_HIGH]); + suffix_tbl[SUFFIX_HIGH]); } int __init parse_crashkernel_low(char *cmdline, @@ -301,7 +301,7 @@ int __init parse_crashkernel_low(char *cmdline, unsigned long long *crash_base) { return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base, - "crashkernel=", suffix_tbl[SUFFIX_LOW]); + suffix_tbl[SUFFIX_LOW]); } /* -- 2.41.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -next v9 0/2] support allocating crashkernel above 4G explicitly on riscv
Hi folks, This is a friendly ping. Anyone has additional comments on this patch set? Best regards, Jiahao On 2023/7/27 1:49, Chen Jiahao wrote: On riscv, the current crash kernel allocation logic is trying to allocate within 32bit addressible memory region by default, if failed, try to allocate without 4G restriction. In need of saving DMA zone memory while allocating a relatively large crash kernel region, allocating the reserved memory top down in high memory, without overlapping the DMA zone, is a mature solution. Hence this patchset introduces the parameter option crashkernel=X,[high,low]. One can reserve the crash kernel from high memory above DMA zone range by explicitly passing "crashkernel=X,high"; or reserve a memory range below 4G with "crashkernel=X,low". Besides, there are few rules need to take notice: 1. "crashkernel=X,[high,low]" will be ignored if "crashkernel=size" is specified. 2. "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed and there is enough memory to be allocated under 4G. 3. When allocating crashkernel above 4G and no "crashkernel=X,low" is specified, a 128M low memory will be allocated automatically for swiotlb bounce buffer. See Documentation/admin-guide/kernel-parameters.txt for more information. To verify loading the crashkernel, adapted kexec-tools is attached below: https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2 Following test cases have been performed as expected: 1) crashkernel=256M //low=256M 2) crashkernel=1G//low=1G 3) crashkernel=4G//high=4G, low=128M(default) 4) crashkernel=4G crashkernel=256M,high //high=4G, low=128M(default), high is ignored 5) crashkernel=4G crashkernel=256M,low //high=4G, low=128M(default), low is ignored 6) crashkernel=4G,high //high=4G, low=128M(default) 7) crashkernel=256M,low //low=0M, invalid 8) crashkernel=4G,high crashkernel=256M,low //high=4G, low=256M 9) crashkernel=4G,high crashkernel=4G,low//high=0M, low=0M, invalid 10) crashkernel=512M@0xd000 //low=512M 11) crashkernel=1G,high crashkernel=0M,low //high=1G, low=0M Changes since [v9]: 1. As per Conor's comment, rebase to correct base on riscv/for-next branch. No code logic changed. Changes since [v8]: 1. Rebase to newest mainline head, not modifying any code logic. Changes since [v7]: 1. Minor refactor: move crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE into the !high branch when the first allocation fails. Not changing the result but further align with Arm64 logic, refer to Baoquan's comment. 2. Add test case "crashkernel=1G,high crashkernel=0M,low", the result also matches our expectation. Changes since [v6]: 1. Introduce the "high" flag to mark whether "crashkernel=X,high" is passed. Fix the retrying logic between "crashkernel=X,high" case and others when the first allocation attempt fails. Changes since [v5]: 1. Update the crashkernel allocation logic when crashkernel=X,high is specified. In this case, region above 4G will directly get reserved as crashkernel, rather than trying lower 32bit allocation first. Changes since [v4]: 1. Update some imprecise code comments for cmdline parsing. Changes since [v3]: 1. Update to print warning and return explicitly on failure when crashkernel=size@offset is specified. Not changing the result in this case but making the logic more straightforward. 2. Some minor cleanup. Changes since [v2]: 1. Update the allocation logic to ensure the high crashkernel region is reserved strictly above dma32_phys_limit. 2. Clean up some minor format problems. Chen Jiahao (2): riscv: kdump: Implement crashkernel=X,[high,low] docs: kdump: Update the crashkernel description for riscv .../admin-guide/kernel-parameters.txt | 15 +-- arch/riscv/kernel/setup.c | 5 + arch/riscv/mm/init.c | 93 +-- 3 files changed, 99 insertions(+), 14 deletions(-) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec