Re: [PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code

2023-08-29 Thread kernel test robot
Hi Baoquan,

kernel test robot noticed the following build errors:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on tip/x86/core powerpc/next powerpc/fixes v6.5]
[cannot apply to linus/master next-20230829]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/crash_core-c-remove-unnecessary-parameter-of-function/20230829-201942
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
for-next/core
patch link:
https://lore.kernel.org/r/20230829121610.138107-7-bhe%40redhat.com
patch subject: [PATCH v2 6/8] x86: kdump: use generic interface to simplify 
crashkernel reservation code
config: x86_64-randconfig-r022-20230830 
(https://download.01.org/0day-ci/archive/20230830/202308300910.e0i4pijt-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20230830/202308300910.e0i4pijt-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308300910.e0i4pijt-...@intel.com/

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `reserve_crashkernel_low':
   kernel/crash_core.c:369: undefined reference to `crashk_low_res'
   ld: kernel/crash_core.c:369: undefined reference to `crashk_low_res'
   ld: kernel/crash_core.c:370: undefined reference to `crashk_low_res'
   ld: kernel/crash_core.c:369: undefined reference to `crashk_low_res'
   ld: kernel/crash_core.c:370: undefined reference to `crashk_low_res'
   ld: vmlinux.o:kernel/crash_core.c:371: more undefined references to 
`crashk_low_res' follow
   ld: vmlinux.o: in function `reserve_crashkernel_generic':
>> kernel/crash_core.c:453: undefined reference to `crashk_res'
>> ld: kernel/crash_core.c:453: undefined reference to `crashk_res'
   ld: kernel/crash_core.c:454: undefined reference to `crashk_res'
>> ld: kernel/crash_core.c:453: undefined reference to `crashk_res'
   ld: kernel/crash_core.c:454: undefined reference to `crashk_res'
   ld: vmlinux.o:kernel/crash_core.c:454: more undefined references to 
`crashk_res' follow


vim +453 kernel/crash_core.c

71d2bcec2d4d69 Philipp Rudo 2021-12-24  353  
6bee83d29d2e09 Baoquan He   2023-08-29  354  #ifdef 
CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
6bee83d29d2e09 Baoquan He   2023-08-29  355  static int __init 
reserve_crashkernel_low(unsigned long long low_size)
6bee83d29d2e09 Baoquan He   2023-08-29  356  {
6bee83d29d2e09 Baoquan He   2023-08-29  357  #ifdef CONFIG_64BIT
6bee83d29d2e09 Baoquan He   2023-08-29  358 unsigned long long low_base;
6bee83d29d2e09 Baoquan He   2023-08-29  359  
6bee83d29d2e09 Baoquan He   2023-08-29  360 low_base = 
memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
6bee83d29d2e09 Baoquan He   2023-08-29  361 if (!low_base) {
6bee83d29d2e09 Baoquan He   2023-08-29  362 pr_err("cannot allocate 
crashkernel low memory (size:0x%llx).\n", low_size);
6bee83d29d2e09 Baoquan He   2023-08-29  363 return -ENOMEM;
6bee83d29d2e09 Baoquan He   2023-08-29  364 }
6bee83d29d2e09 Baoquan He   2023-08-29  365  
6bee83d29d2e09 Baoquan He   2023-08-29  366 pr_info("crashkernel low memory 
reserved: 0x%08llx - 0x%08llx (%lld MB)\n",
6bee83d29d2e09 Baoquan He   2023-08-29  367 low_base, low_base + 
low_size, low_size >> 20);
6bee83d29d2e09 Baoquan He   2023-08-29  368  
6bee83d29d2e09 Baoquan He   2023-08-29 @369 crashk_low_res.start = low_base;
6bee83d29d2e09 Baoquan He   2023-08-29  370 crashk_low_res.end   = low_base 
+ low_size - 1;
6bee83d29d2e09 Baoquan He   2023-08-29  371 
insert_resource(_resource, _low_res);
6bee83d29d2e09 Baoquan He   2023-08-29  372  #endif
6bee83d29d2e09 Baoquan He   2023-08-29  373 return 0;
6bee83d29d2e09 Baoquan He   2023-08-29  374  }
6bee83d29d2e09 Baoquan He   2023-08-29  375  
6bee83d29d2e09 Baoquan He   2023-08-29  376  void __init 
reserve_crashkernel_generic(char *cmdline,
6bee83d29d2e09 Baoquan He   2023-08-29  377  unsigned 
long long crash_size,
6bee83d29d2e09 Baoquan He   2023-08-29  378  unsigned 
long long crash_base,
6bee83d29d2e09 Baoquan He   2023-08-29  379  unsigned 
long long crash_low_size,
6bee83d29d2e09 Baoquan He   2023-08-29  380  bool high)
6bee83d29d2e09 Baoquan He   2023-08-29  381  {
6bee83d29d2e09 Baoquan He   2023-08-29  382 unsigned long long search_end = 
CRASH_ADDR_LOW_MAX, search_base = 0;
6bee83d29d2e09 Baoquan He   2023-08-29  383 bool fixed_base = 

Re: [RFC] IMA Log Snapshotting Design Proposal

2023-08-29 Thread Paul Moore
On Tue, Aug 29, 2023 at 5:54 PM Mimi Zohar  wrote:
> On Tue, 2023-08-29 at 17:30 -0400, Paul Moore wrote:
> > On Tue, Aug 29, 2023 at 5:05 PM Mimi Zohar  wrote:
> > > On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote:
> > > > On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar  wrote:
> > > > > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote:
> > > > > > On 8/14/2023 3:02 PM, Mimi Zohar wrote:
> > > > > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote:
> > > > > > >>> This design seems overly complex and requires synchronization 
> > > > > > >>> between
> > > > > > >>> the "snapshot" record and exporting the records from the 
> > > > > > >>> measurement
> > > > > > >>> list.  None of this would be necessary if the measurements were 
> > > > > > >>> copied
> > > > > > >>> from kernel memory to a backing file (e.g. tmpfs), as described 
> > > > > > >>> in [1].
> > > > > > Even if the Kernel maintains the link between a tmpfs exported and 
> > > > > > an
> > > > > > in-memory IMA log - it still has to copy the tmpfs portion to the
> > > > > > Kernel memory during kexec soft boot.  tmpfs is cleared during 
> > > > > > kexec,
> > > > > > so this copying of tmpfs back to kernel memory is necessary to 
> > > > > > preserve
> > > > > > the integrity of the log during kexec.  But the copying would add 
> > > > > > back
> > > > > > the memory pressure on the node during kexec (which may result in
> > > > > > out-of-memory), defeating the purpose of the overall effort/feature.
> > > > > > Copying to a regular *persistent* protected file seems a cleaner
> > > > > > approach, compared to tmpfs.
> > > > >
> > > > > From a kernel perspective, it doesn't make a difference if userspace
> > > > > provides a tmpfs or persistent file.  As per the discussion
> > > > > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t
> > > > > , userspace provides the kernel with the file descriptor of the opened
> > > > > file.
> > > > >
> > > > > > We prototyped this solution, however it
> > > > > > does not seem to be a common pattern within the Kernel to write 
> > > > > > state
> > > > > > directly to files on disk file systems.  We considered two potential
> > > > > > options:
> > > > >
> > > > > If no file descriptor is provided, then the measurements aren't copied
> > > > > and removed from the securityfs file.  If there are write errors, the
> > > > > measurements aren't removed from the securityfs file until the write
> > > > > errors are resolved.
> > > >
> > > > It sounds like this approach would require the file/filesystem to be
> > > > continuously available for the life of the system once the log was
> > > > snapshotted/overflowed to persistent storage, yes?  Assuming that is
> > > > the case, what happens if the file/filesystem becomes inaccessible at
> > > > some point and an attestation client attempts to read the entire log?
> > >
> > > The main purpose of the change is to addres kernel memory pressure.
> > > Two designs are being discussed: Sush's "snapshotting" design and
> > > Amir's original suggestion of continously exporting the measurement
> > > records to a tmpfs or regular file.  Both designs require verifying the
> > > initial attestation quote by walking the entire measurement list,
> > > calculating the expected TPM PCR value(s).  That doesn't change.
> >
> > Sure, but my question is about what happens if portions of the
> > measurement list disappear due to file/filesystem problems?  How is
> > that handled?
>
> With the "snapshotting" solution there could be multiple files, so
> portions could be missing.  The other solution, the preferred solution,
> would be one file.

With the snapshotting approach the kernel doesn't need to maintain
ongoing access to a file, that is left up to the user process
performing the attestation (or simply inspecting the logs).  I have to
ask, for the third time now in as many hours, how does the proposed
kernel-holds-an-fd-open solution handle the case where the
file/filesystem is no longer accessible?  The snapshotting approach
should be much more resilient here as the error recovery paths can be
much more involved than what we would have available in the kernel,
not to mention the flexibility of allowing a user process to determine
how to store and manage the snapshotted log.

Considering that the snapshotting approach is opt-in (userspace has to
initiate the snapshot), I'm not sure the concern over log offsets is a
significant objection: existing userspace will never trigger a
snapshot, and new userspace that could potentially trigger a snapshot
should be written to take that into account.

-- 
paul-moore.com

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH] Introduce persistent memory pool

2023-08-29 Thread Stanislav Kinsburskii
On Mon, Aug 28, 2023 at 10:50:19PM +0200, Alexander Graf wrote:
> +kexec, iommu, kvm
> 
> On 23.08.23 04:45, Stanislav Kinsburskii wrote:
> > 
> > +akpm, +linux-mm
> > 
> > On Fri, Aug 25, 2023 at 01:32:40PM +, Gowans, James wrote:
> > > On Fri, 2023-08-25 at 10:05 +0200, Greg Kroah-Hartman wrote:
> > > 
> > > Thanks for adding me to this thread Greg!
> > > 
> > > > On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote:
> > > > > This patch addresses the need for a memory allocator dedicated to
> > > > > persistent memory within the kernel. This allocator will preserve
> > > > > kernel-specific states like DMA passthrough device states, IOMMU 
> > > > > state, and
> > > > > more across kexec.
> > > > > The proposed solution offers a foundational implementation for 
> > > > > potential
> > > > > custom solutions that might follow. Though the implementation is
> > > > > intentionally kept concise and straightforward to foster discussion 
> > > > > and
> > > > > feedback, it's fully functional in its current state.
> > > Hi Stanislav, it looks like we're working on similar things. I'm looking
> > > to develop a mechanism to support hypervisor live update for when KVM is
> > > running VMs with PCI device passthrough. VMs with device passthrough
> > > also necessitates passing and re-hydrating IOMMU state so that DMA can
> > > continue during live update.
> > > 
> > > Planning on having an LPC session on this topic:
> > > https://lpc.events/event/17/abstracts/1629/ (currently it's only a
> > > submitted abstract so not sure if visible, hopefully it will be soon).
> > > 
> > > We are looking at implementing persistence across kexec via an in-memory
> > > filesystem on top of reserved memory. This would have files for anything
> > > that needs to be persisted. That includes files for IOMMU pgtables, for
> > > guest memory or userspace-accessible memory.
> > > 
> > > It may be nice to solve all kexec persistence requirements with one
> > > solution, but we can consider IOMMU separately. There are at least three
> > > ways that this can be done:
> > > a) carving out reserved memory for pgtables. This is done by your
> > > proposal here, as well as my suggestion of a filesystem.
> > > b) pre/post kexec hooks for drivers to serialise state and pass it
> > > across in a structured format from old to new kernel.
> > > c) Reconstructing IOMMU state in the new kernel by starting at the
> > > hardware registers and walking the page tables. No state passing needed.
> > > 
> > > Have you considered option (b) and (c) here? One of the implications of
> > > (b) and (c) are that they would need to hook into the buddy allocator
> > > really early to be able to carve out the reconstructed page tables
> > > before the allocator is used. Similar to how pkram [0] hooks in early to
> > > carve out pages used for its filesystem.
> > > 
> > Hi James,
> > 
> > We are indeed working on similar things, so thanks for chiming in.
> > I've seen pkram proposal as well as your comments there.
> > 
> > I think (b) will need some persistent-over-kexec memory to pass the
> > state across kexec as well as some key-value store persisted as well.
> > And the proposed persistent memory pool is aimed exactly for this
> > purpose.
> > Or do you imply some other way to pass driver's data accross kexec?
> 
> 
> If I had to build this, I'd probably do it just like device tree passing on
> ARM. It's a single, physically contiguous blob of data whose entry point you
> pass to the target kernel. IIRC ACPI passing works similarly. This would
> just be one more opaque data structure that then needs very strict
> versioning and forward/backward compat guarantees.
> 

Device tree or ACPI are options indeed. However AFAIU in case of DT user
space has to involved into the picture to modify and complie it, while
ACPI isn't flexible or easily extendable.
Also, AFAIU both these standards were designed with passing
hardware-specific data in mind from bootstrap software to an OS kernel
and thus were never really intended to be used for creating a persistent
state accross kexec.
To me, an attempt to use either of them to pass kernel-specific data looks
like an abuse (or misuse) excused by the simplicity of implementation.

> 
> > I dind't consider (c) yet, thanks for for the pointer.
> > 
> > I have a question in this scope: how is PCI devices registers state is 
> > persisted
> > across kexec with the files system you are working on? I.e. how does
> > driver know, that the device shouldn't not be reinitialized?
> 
> 
> The easiest way to do it initially would be kernel command line options that
> hack up the drivers. But I suppose depending on the option we go with, you
> can also use the respective "natural" path:
> 
> (a) A special metadata file that explains the state to the driver
> (b) An entry in the structured file format that explains the state to the
> target driver
> (c) Compatible target drivers try to enumerate state from the target
> 

Re: [RFC] IMA Log Snapshotting Design Proposal

2023-08-29 Thread Mimi Zohar
On Tue, 2023-08-29 at 17:30 -0400, Paul Moore wrote:
> On Tue, Aug 29, 2023 at 5:05 PM Mimi Zohar  wrote:
> > On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote:
> > > On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar  wrote:
> > > > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote:
> > > > > On 8/14/2023 3:02 PM, Mimi Zohar wrote:
> > > > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote:
> > > > > >>> This design seems overly complex and requires synchronization 
> > > > > >>> between
> > > > > >>> the "snapshot" record and exporting the records from the 
> > > > > >>> measurement
> > > > > >>> list.  None of this would be necessary if the measurements were 
> > > > > >>> copied
> > > > > >>> from kernel memory to a backing file (e.g. tmpfs), as described 
> > > > > >>> in [1].
> > > > > Even if the Kernel maintains the link between a tmpfs exported and an
> > > > > in-memory IMA log - it still has to copy the tmpfs portion to the
> > > > > Kernel memory during kexec soft boot.  tmpfs is cleared during kexec,
> > > > > so this copying of tmpfs back to kernel memory is necessary to 
> > > > > preserve
> > > > > the integrity of the log during kexec.  But the copying would add back
> > > > > the memory pressure on the node during kexec (which may result in
> > > > > out-of-memory), defeating the purpose of the overall effort/feature.
> > > > > Copying to a regular *persistent* protected file seems a cleaner
> > > > > approach, compared to tmpfs.
> > > >
> > > > From a kernel perspective, it doesn't make a difference if userspace
> > > > provides a tmpfs or persistent file.  As per the discussion
> > > > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t
> > > > , userspace provides the kernel with the file descriptor of the opened
> > > > file.
> > > >
> > > > > We prototyped this solution, however it
> > > > > does not seem to be a common pattern within the Kernel to write state
> > > > > directly to files on disk file systems.  We considered two potential
> > > > > options:
> > > >
> > > > If no file descriptor is provided, then the measurements aren't copied
> > > > and removed from the securityfs file.  If there are write errors, the
> > > > measurements aren't removed from the securityfs file until the write
> > > > errors are resolved.
> > >
> > > It sounds like this approach would require the file/filesystem to be
> > > continuously available for the life of the system once the log was
> > > snapshotted/overflowed to persistent storage, yes?  Assuming that is
> > > the case, what happens if the file/filesystem becomes inaccessible at
> > > some point and an attestation client attempts to read the entire log?
> >
> > The main purpose of the change is to addres kernel memory pressure.
> > Two designs are being discussed: Sush's "snapshotting" design and
> > Amir's original suggestion of continously exporting the measurement
> > records to a tmpfs or regular file.  Both designs require verifying the
> > initial attestation quote by walking the entire measurement list,
> > calculating the expected TPM PCR value(s).  That doesn't change.
> 
> Sure, but my question is about what happens if portions of the
> measurement list disappear due to file/filesystem problems?  How is
> that handled?

With the "snapshotting" solution there could be multiple files, so
portions could be missing.  The other solution, the preferred solution,
would be one file.

Any suggestions?

-- 
thanks,

Mimi



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code

2023-08-29 Thread kernel test robot
Hi Baoquan,

kernel test robot noticed the following build errors:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on tip/x86/core powerpc/next powerpc/fixes linus/master 
v6.5]
[cannot apply to next-20230829]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/crash_core-c-remove-unnecessary-parameter-of-function/20230829-201942
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
for-next/core
patch link:
https://lore.kernel.org/r/20230829121610.138107-7-bhe%40redhat.com
patch subject: [PATCH v2 6/8] x86: kdump: use generic interface to simplify 
crashkernel reservation code
config: i386-randconfig-r026-20230830 
(https://download.01.org/0day-ci/archive/20230830/202308300522.oog0v3u3-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20230830/202308300522.oog0v3u3-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202308300522.oog0v3u3-...@intel.com/

All errors (new ones prefixed by >>):

   ld: kernel/crash_core.o: in function `reserve_crashkernel_generic':
>> kernel/crash_core.c:450: undefined reference to `crashk_low_res'
>> ld: kernel/crash_core.c:451: undefined reference to `crashk_low_res'
>> ld: kernel/crash_core.c:455: undefined reference to `crashk_res'
   ld: kernel/crash_core.c:453: undefined reference to `crashk_res'
   ld: kernel/crash_core.c:454: undefined reference to `crashk_res'


vim +450 kernel/crash_core.c

6bee83d29d2e09 Baoquan He 2023-08-29  375  
6bee83d29d2e09 Baoquan He 2023-08-29  376  void __init 
reserve_crashkernel_generic(char *cmdline,
6bee83d29d2e09 Baoquan He 2023-08-29  377unsigned 
long long crash_size,
6bee83d29d2e09 Baoquan He 2023-08-29  378unsigned 
long long crash_base,
6bee83d29d2e09 Baoquan He 2023-08-29  379unsigned 
long long crash_low_size,
6bee83d29d2e09 Baoquan He 2023-08-29  380bool high)
6bee83d29d2e09 Baoquan He 2023-08-29  381  {
6bee83d29d2e09 Baoquan He 2023-08-29  382   unsigned long long search_end = 
CRASH_ADDR_LOW_MAX, search_base = 0;
6bee83d29d2e09 Baoquan He 2023-08-29  383   bool fixed_base = false;
6bee83d29d2e09 Baoquan He 2023-08-29  384  
6bee83d29d2e09 Baoquan He 2023-08-29  385   /* User specifies base address 
explicitly. */
6bee83d29d2e09 Baoquan He 2023-08-29  386   if (crash_base) {
6bee83d29d2e09 Baoquan He 2023-08-29  387   fixed_base = true;
6bee83d29d2e09 Baoquan He 2023-08-29  388   search_base = 
crash_base;
6bee83d29d2e09 Baoquan He 2023-08-29  389   search_end = crash_base 
+ crash_size;
6bee83d29d2e09 Baoquan He 2023-08-29  390   }
6bee83d29d2e09 Baoquan He 2023-08-29  391  
6bee83d29d2e09 Baoquan He 2023-08-29  392   if (high) {
6bee83d29d2e09 Baoquan He 2023-08-29  393   search_base = 
CRASH_ADDR_LOW_MAX;
6bee83d29d2e09 Baoquan He 2023-08-29  394   search_end = 
CRASH_ADDR_HIGH_MAX;
6bee83d29d2e09 Baoquan He 2023-08-29  395   }
6bee83d29d2e09 Baoquan He 2023-08-29  396  
6bee83d29d2e09 Baoquan He 2023-08-29  397  retry:
6bee83d29d2e09 Baoquan He 2023-08-29  398   crash_base = 
memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
6bee83d29d2e09 Baoquan He 2023-08-29  399   
   search_base, search_end);
6bee83d29d2e09 Baoquan He 2023-08-29  400   if (!crash_base) {
6bee83d29d2e09 Baoquan He 2023-08-29  401   /*
6bee83d29d2e09 Baoquan He 2023-08-29  402* For 
crashkernel=size[KMG]@offset[KMG], print out failure
6bee83d29d2e09 Baoquan He 2023-08-29  403* message if can't 
reserve the specified region.
6bee83d29d2e09 Baoquan He 2023-08-29  404*/
6bee83d29d2e09 Baoquan He 2023-08-29  405   if (fixed_base) {
6bee83d29d2e09 Baoquan He 2023-08-29  406   
pr_warn("crashkernel reservation failed - memory is in use.\n");
6bee83d29d2e09 Baoquan He 2023-08-29  407   return;
6bee83d29d2e09 Baoquan He 2023-08-29  408   }
6bee83d29d2e09 Baoquan He 2023-08-29  409  
6bee83d29d2e09 Baoquan He 2023-08-29  410   /*
6bee83d29d2e09 Baoquan He 2023-08-29  411* For 
crashkernel=size[KMG], if the first attempt was for
6bee83d29d2e09 Baoquan He 2023-08-29  412* low memory, fall 
back to high memory, the minimum required
6bee83d29d2e09 Baoquan He 2023-08-2

Re: [RFC] IMA Log Snapshotting Design Proposal

2023-08-29 Thread Paul Moore
On Tue, Aug 29, 2023 at 5:05 PM Mimi Zohar  wrote:
> On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote:
> > On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar  wrote:
> > > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote:
> > > > On 8/14/2023 3:02 PM, Mimi Zohar wrote:
> > > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote:
> > > > >>> This design seems overly complex and requires synchronization 
> > > > >>> between
> > > > >>> the "snapshot" record and exporting the records from the measurement
> > > > >>> list.  None of this would be necessary if the measurements were 
> > > > >>> copied
> > > > >>> from kernel memory to a backing file (e.g. tmpfs), as described in 
> > > > >>> [1].
> > > > Even if the Kernel maintains the link between a tmpfs exported and an
> > > > in-memory IMA log - it still has to copy the tmpfs portion to the
> > > > Kernel memory during kexec soft boot.  tmpfs is cleared during kexec,
> > > > so this copying of tmpfs back to kernel memory is necessary to preserve
> > > > the integrity of the log during kexec.  But the copying would add back
> > > > the memory pressure on the node during kexec (which may result in
> > > > out-of-memory), defeating the purpose of the overall effort/feature.
> > > > Copying to a regular *persistent* protected file seems a cleaner
> > > > approach, compared to tmpfs.
> > >
> > > From a kernel perspective, it doesn't make a difference if userspace
> > > provides a tmpfs or persistent file.  As per the discussion
> > > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t
> > > , userspace provides the kernel with the file descriptor of the opened
> > > file.
> > >
> > > > We prototyped this solution, however it
> > > > does not seem to be a common pattern within the Kernel to write state
> > > > directly to files on disk file systems.  We considered two potential
> > > > options:
> > >
> > > If no file descriptor is provided, then the measurements aren't copied
> > > and removed from the securityfs file.  If there are write errors, the
> > > measurements aren't removed from the securityfs file until the write
> > > errors are resolved.
> >
> > It sounds like this approach would require the file/filesystem to be
> > continuously available for the life of the system once the log was
> > snapshotted/overflowed to persistent storage, yes?  Assuming that is
> > the case, what happens if the file/filesystem becomes inaccessible at
> > some point and an attestation client attempts to read the entire log?
>
> The main purpose of the change is to addres kernel memory pressure.
> Two designs are being discussed: Sush's "snapshotting" design and
> Amir's original suggestion of continously exporting the measurement
> records to a tmpfs or regular file.  Both designs require verifying the
> initial attestation quote by walking the entire measurement list,
> calculating the expected TPM PCR value(s).  That doesn't change.

Sure, but my question is about what happens if portions of the
measurement list disappear due to file/filesystem problems?  How is
that handled?

-- 
paul-moore.com

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] IMA Log Snapshotting Design Proposal

2023-08-29 Thread Mimi Zohar
On Tue, 2023-08-29 at 15:34 -0400, Paul Moore wrote:
> On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar  wrote:
> > On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote:
> > > On 8/14/2023 3:02 PM, Mimi Zohar wrote:
> > > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote:
> > > >>> This design seems overly complex and requires synchronization between
> > > >>> the "snapshot" record and exporting the records from the measurement
> > > >>> list.  None of this would be necessary if the measurements were copied
> > > >>> from kernel memory to a backing file (e.g. tmpfs), as described in 
> > > >>> [1].
> > > Even if the Kernel maintains the link between a tmpfs exported and an
> > > in-memory IMA log - it still has to copy the tmpfs portion to the
> > > Kernel memory during kexec soft boot.  tmpfs is cleared during kexec,
> > > so this copying of tmpfs back to kernel memory is necessary to preserve
> > > the integrity of the log during kexec.  But the copying would add back
> > > the memory pressure on the node during kexec (which may result in
> > > out-of-memory), defeating the purpose of the overall effort/feature.
> > > Copying to a regular *persistent* protected file seems a cleaner
> > > approach, compared to tmpfs.
> >
> > From a kernel perspective, it doesn't make a difference if userspace
> > provides a tmpfs or persistent file.  As per the discussion
> > https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t
> > , userspace provides the kernel with the file descriptor of the opened
> > file.
> >
> > > We prototyped this solution, however it
> > > does not seem to be a common pattern within the Kernel to write state
> > > directly to files on disk file systems.  We considered two potential
> > > options:
> >
> > If no file descriptor is provided, then the measurements aren't copied
> > and removed from the securityfs file.  If there are write errors, the
> > measurements aren't removed from the securityfs file until the write
> > errors are resolved.
> 
> It sounds like this approach would require the file/filesystem to be
> continuously available for the life of the system once the log was
> snapshotted/overflowed to persistent storage, yes?  Assuming that is
> the case, what happens if the file/filesystem becomes inaccessible at
> some point and an attestation client attempts to read the entire log?

The main purpose of the change is to addres kernel memory pressure. 
Two designs are being discussed: Sush's "snapshotting" design and
Amir's original suggestion of continously exporting the measurement
records to a tmpfs or regular file.  Both designs require verifying the
initial attestation quote by walking the entire measurement list,
calculating the expected TPM PCR value(s).  That doesn't change.

-- 
thanks,

Mimi


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] IMA Log Snapshotting Design Proposal

2023-08-29 Thread Paul Moore
On Mon, Aug 21, 2023 at 7:08 PM Mimi Zohar  wrote:
> On Mon, 2023-08-21 at 15:05 -0700, Sush Shringarputale wrote:
> > On 8/14/2023 3:02 PM, Mimi Zohar wrote:
> > > On Mon, 2023-08-14 at 14:42 -0700, Sush Shringarputale wrote:
> > >>> This design seems overly complex and requires synchronization between
> > >>> the "snapshot" record and exporting the records from the measurement
> > >>> list.  None of this would be necessary if the measurements were copied
> > >>> from kernel memory to a backing file (e.g. tmpfs), as described in [1].
> > Even if the Kernel maintains the link between a tmpfs exported and an
> > in-memory IMA log - it still has to copy the tmpfs portion to the
> > Kernel memory during kexec soft boot.  tmpfs is cleared during kexec,
> > so this copying of tmpfs back to kernel memory is necessary to preserve
> > the integrity of the log during kexec.  But the copying would add back
> > the memory pressure on the node during kexec (which may result in
> > out-of-memory), defeating the purpose of the overall effort/feature.
> > Copying to a regular *persistent* protected file seems a cleaner
> > approach, compared to tmpfs.
>
> From a kernel perspective, it doesn't make a difference if userspace
> provides a tmpfs or persistent file.  As per the discussion
> https://lore.kernel.org/linux-integrity/caoq4uxj4pv2wr1wgvbcdr-tna5dszt3rvddzkgah1aev_-r...@mail.gmail.com/#t
> , userspace provides the kernel with the file descriptor of the opened
> file.
>
> > We prototyped this solution, however it
> > does not seem to be a common pattern within the Kernel to write state
> > directly to files on disk file systems.  We considered two potential
> > options:
>
> If no file descriptor is provided, then the measurements aren't copied
> and removed from the securityfs file.  If there are write errors, the
> measurements aren't removed from the securityfs file until the write
> errors are resolved.

It sounds like this approach would require the file/filesystem to be
continuously available for the life of the system once the log was
snapshotted/overflowed to persistent storage, yes?  Assuming that is
the case, what happens if the file/filesystem becomes inaccessible at
some point and an attestation client attempts to read the entire log?

-- 
paul-moore.com

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 4/8] crash_core: add generic function to do reservation

2023-08-29 Thread Baoquan He
In architecture like x86_64, arm64 and riscv, they have vast virtual
address space and usually have huge physical memory RAM. Their
crashkernel reservation doesn't have to be limited under 4G RAM,
but can be extended to the whole physical memory via crashkernel=,high
support.

Now add function reserve_crashkernel_generic() to reserve crashkernel
memory if users specify any case of kernel pamameters, like
crashkernel=xM[@offset] or crashkernel=,high|low.

This is preparation to simplify code of crashkernel=,high support
in architecutures.

Signed-off-by: Baoquan He 
---
 include/linux/crash_core.h |  34 ++--
 kernel/crash_core.c| 109 -
 2 files changed, 136 insertions(+), 7 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 85260bf4a734..2f732493e922 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -77,12 +77,6 @@ Elf_Word *append_elf_note(Elf_Word *buf, char *name, 
unsigned int type,
  void *data, size_t data_len);
 void final_note(Elf_Word *buf);
 
-#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
-#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE
-#define DEFAULT_CRASH_KERNEL_LOW_SIZE  (128UL << 20)
-#endif
-#endif
-
 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base,
unsigned long long *low_size, bool *high);
@@ -91,4 +85,32 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
 int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
 
+#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
+#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE
+#define DEFAULT_CRASH_KERNEL_LOW_SIZE  (128UL << 20)
+#endif
+#ifndef CRASH_ALIGN
+#define CRASH_ALIGNSZ_2M
+#endif
+#ifndef CRASH_ADDR_LOW_MAX
+#define CRASH_ADDR_LOW_MAX SZ_4G
+#endif
+#ifndef CRASH_ADDR_HIGH_MAX
+#define CRASH_ADDR_HIGH_MAXmemblock_end_of_DRAM()
+#endif
+
+void __init reserve_crashkernel_generic(char *cmdline,
+   unsigned long long crash_size,
+   unsigned long long crash_base,
+   unsigned long long crash_low_size,
+   bool high);
+#else
+static inline void __init reserve_crashkernel_generic(char *cmdline,
+   unsigned long long crash_size,
+   unsigned long long crash_base,
+   unsigned long long crash_low_size,
+   bool high)
+{}
+#endif
+
 #endif /* LINUX_CRASH_CORE_H */
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 355b0ab5189c..6bc00cc390b5 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -5,11 +5,13 @@
  */
 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -349,6 +351,111 @@ static int __init parse_crashkernel_dummy(char *arg)
 }
 early_param("crashkernel", parse_crashkernel_dummy);
 
+#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
+static int __init reserve_crashkernel_low(unsigned long long low_size)
+{
+#ifdef CONFIG_64BIT
+   unsigned long long low_base;
+
+   low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
CRASH_ADDR_LOW_MAX);
+   if (!low_base) {
+   pr_err("cannot allocate crashkernel low memory 
(size:0x%llx).\n", low_size);
+   return -ENOMEM;
+   }
+
+   pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld 
MB)\n",
+   low_base, low_base + low_size, low_size >> 20);
+
+   crashk_low_res.start = low_base;
+   crashk_low_res.end   = low_base + low_size - 1;
+   insert_resource(_resource, _low_res);
+#endif
+   return 0;
+}
+
+void __init reserve_crashkernel_generic(char *cmdline,
+unsigned long long crash_size,
+unsigned long long crash_base,
+unsigned long long crash_low_size,
+bool high)
+{
+   unsigned long long search_end = CRASH_ADDR_LOW_MAX, search_base = 0;
+   bool fixed_base = false;
+
+   /* User specifies base address explicitly. */
+   if (crash_base) {
+   fixed_base = true;
+   search_base = crash_base;
+   search_end = crash_base + crash_size;
+   }
+
+   if (high) {
+   search_base = CRASH_ADDR_LOW_MAX;
+   search_end = CRASH_ADDR_HIGH_MAX;
+   }
+
+retry:
+   crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
+  search_base, search_end);
+   if (!crash_base) {
+   /*
+* For crashkernel=size[KMG]@offset[KMG], print out failure
+* message if can't reserve the specified region.
+*/
+

[PATCH v2 7/8] arm64: kdump: use generic interface to simplify crashkernel reservation

2023-08-29 Thread Baoquan He
With the help of newly changed function parse_crashkernel() and
generic reserve_crashkernel_generic(), crashkernel reservation can be
simplified by steps:

1) Add a new header file , and define CRASH_ALIGN,
   CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and
   DEFAULT_CRASH_KERNEL_LOW_SIZE in ;

2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
   reserve_crashkernel_generic();

3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
   arch/arm64/Kconfig.

The old reserve_crashkernel_low() and reserve_crashkernel() can be
removed.

Signed-off-by: Baoquan He 
---
 arch/arm64/Kconfig  |   3 +
 arch/arm64/include/asm/crash_core.h |  10 ++
 arch/arm64/mm/init.c| 140 ++--
 3 files changed, 21 insertions(+), 132 deletions(-)
 create mode 100644 arch/arm64/include/asm/crash_core.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 29db061db9bb..07fb8c71339d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1481,6 +1481,9 @@ config KEXEC_FILE
  for kernel and initramfs as opposed to list of segments as
  accepted by previous system call.
 
+config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
+   def_bool CRASH_CORE
+
 config KEXEC_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
diff --git a/arch/arm64/include/asm/crash_core.h 
b/arch/arm64/include/asm/crash_core.h
new file mode 100644
index ..9f5c8d339f44
--- /dev/null
+++ b/arch/arm64/include/asm/crash_core.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ARM64_CRASH_CORE_H
+#define _ARM64_CRASH_CORE_H
+
+/* Current arm64 boot protocol requires 2MB alignment */
+#define CRASH_ALIGN SZ_2M
+
+#define CRASH_ADDR_LOW_MAX  arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAX (PHYS_MASK + 1)
+#endif
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 4ad637508b75..48ab23531bb6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -64,15 +64,6 @@ EXPORT_SYMBOL(memstart_addr);
  */
 phys_addr_t __ro_after_init arm64_dma_phys_limit;
 
-/* Current arm64 boot protocol requires 2MB alignment */
-#define CRASH_ALIGNSZ_2M
-
-#define CRASH_ADDR_LOW_MAX arm64_dma_phys_limit
-#define CRASH_ADDR_HIGH_MAX(PHYS_MASK + 1)
-#define CRASH_HIGH_SEARCH_BASE SZ_4G
-
-#define DEFAULT_CRASH_KERNEL_LOW_SIZE  (128UL << 20)
-
 /*
  * To make optimal use of block mappings when laying out the linear
  * mapping, round down the base of physical memory to a size that can
@@ -100,140 +91,25 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit;
 #define ARM64_MEMSTART_ALIGN   (1UL << ARM64_MEMSTART_SHIFT)
 #endif
 
-static int __init reserve_crashkernel_low(unsigned long long low_size)
-{
-   unsigned long long low_base;
-
-   low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
CRASH_ADDR_LOW_MAX);
-   if (!low_base) {
-   pr_err("cannot allocate crashkernel low memory 
(size:0x%llx).\n", low_size);
-   return -ENOMEM;
-   }
-
-   pr_info("crashkernel low memory reserved: 0x%08llx - 0x%08llx (%lld 
MB)\n",
-   low_base, low_base + low_size, low_size >> 20);
-
-   crashk_low_res.start = low_base;
-   crashk_low_res.end   = low_base + low_size - 1;
-   insert_resource(_resource, _low_res);
-
-   return 0;
-}
-
-/*
- * reserve_crashkernel() - reserves memory for crash kernel
- *
- * This function reserves memory area given in "crashkernel=" kernel command
- * line parameter. The memory reserved is used by dump capture kernel when
- * primary kernel is crashing.
- */
-static void __init reserve_crashkernel(void)
+static void __init arch_reserve_crashkernel(void)
 {
-   unsigned long long crash_low_size = 0, search_base = 0;
-   unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
+   unsigned long long low_size = 0;
unsigned long long crash_base, crash_size;
char *cmdline = boot_command_line;
-   bool fixed_base = false;
bool high = false;
int ret;
 
if (!IS_ENABLED(CONFIG_KEXEC_CORE))
return;
 
-   /* crashkernel=X[@offset] */
ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
-   _size, _base, NULL, NULL);
-   if (ret == -ENOENT) {
-   ret = parse_crashkernel_high(cmdline, 0, _size, 
_base);
-   if (ret || !crash_size)
-   return;
-
-   /*
-* crashkernel=Y,low can be specified or not, but invalid value
-* is not allowed.
-*/
-   ret = parse_crashkernel_low(cmdline, 0, _low_size, 
_base);
-   if (ret == -ENOENT)
-   crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
-   else if (ret)
-   return;
-
-   

[PATCH v2 6/8] x86: kdump: use generic interface to simplify crashkernel reservation code

2023-08-29 Thread Baoquan He
With the help of newly changed function parse_crashkernel() and
generic reserve_crashkernel_generic(), crashkernel reservation can be
simplified by steps:

1) Add a new header file , and define CRASH_ALIGN,
   CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and
   DEFAULT_CRASH_KERNEL_LOW_SIZE in ;

2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
   reserve_crashkernel_generic(), and do the ARCH specific work if
   needed.

3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
   arch/x86/Kconfig.

When adding DEFAULT_CRASH_KERNEL_LOW_SIZE, add crash_low_size_default()
to calculate crashkernel low memory because x86_64 has special
requirement.

The old reserve_crashkernel_low() and reserve_crashkernel() can be
removed.

Signed-off-by: Baoquan He 
---
 arch/x86/Kconfig  |   3 +
 arch/x86/include/asm/crash_core.h |  34 +++
 arch/x86/kernel/setup.c   | 144 --
 3 files changed, 53 insertions(+), 128 deletions(-)
 create mode 100644 arch/x86/include/asm/crash_core.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8d9e4b362572..c4539dc35985 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2037,6 +2037,9 @@ config KEXEC_FILE
 config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
 
+config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
+   def_bool CRASH_CORE
+
 config KEXEC_SIG
bool "Verify kernel signature during kexec_file_load() syscall"
depends on KEXEC_FILE
diff --git a/arch/x86/include/asm/crash_core.h 
b/arch/x86/include/asm/crash_core.h
new file mode 100644
index ..5fc5e4f94521
--- /dev/null
+++ b/arch/x86/include/asm/crash_core.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _X86_CRASH_CORE_H
+#define _X86_CRASH_CORE_H
+
+/* 16M alignment for crash kernel regions */
+#define CRASH_ALIGN SZ_16M
+
+/*
+ * Keep the crash kernel below this limit.
+ *
+ * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
+ * due to mapping restrictions.
+ *
+ * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
+ * the upper limit of system RAM in 4-level paging mode. Since the kdump
+ * jump could be from 5-level paging to 4-level paging, the jump will fail if
+ * the kernel is put above 64 TB, and during the 1st kernel bootup there's
+ * no good way to detect the paging mode of the target kernel which will be
+ * loaded for dumping.
+ */
+
+#ifdef CONFIG_X86_32
+# define CRASH_ADDR_LOW_MAX SZ_512M
+# define CRASH_ADDR_HIGH_MAXSZ_512M
+#else
+# define CRASH_ADDR_LOW_MAX SZ_4G
+# define CRASH_ADDR_HIGH_MAXSZ_64T
+#endif
+
+# define DEFAULT_CRASH_KERNEL_LOW_SIZE crash_low_size_default()
+
+extern unsigned long crash_low_size_default(void);
+
+#endif /* _X86_CRASH_CORE_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 382c66d2cf71..559a5c4141db 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -474,152 +474,40 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 /*
  * - Crashkernel reservation --
  */
-
-/* 16M alignment for crash kernel regions */
-#define CRASH_ALIGNSZ_16M
-
-/*
- * Keep the crash kernel below this limit.
- *
- * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
- * due to mapping restrictions.
- *
- * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
- * the upper limit of system RAM in 4-level paging mode. Since the kdump
- * jump could be from 5-level paging to 4-level paging, the jump will fail if
- * the kernel is put above 64 TB, and during the 1st kernel bootup there's
- * no good way to detect the paging mode of the target kernel which will be
- * loaded for dumping.
- */
-#ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAXSZ_512M
-# define CRASH_ADDR_HIGH_MAX   SZ_512M
-#else
-# define CRASH_ADDR_LOW_MAXSZ_4G
-# define CRASH_ADDR_HIGH_MAX   SZ_64T
-#endif
-
-static int __init reserve_crashkernel_low(void)
+unsigned long crash_low_size_default(void)
 {
 #ifdef CONFIG_X86_64
-   unsigned long long base, low_base = 0, low_size = 0;
-   unsigned long low_mem_limit;
-   int ret;
-
-   low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
-
-   /* crashkernel=Y,low */
-   ret = parse_crashkernel_low(boot_command_line, low_mem_limit, 
_size, );
-   if (ret) {
-   /*
-* two parts from kernel/dma/swiotlb.c:
-* -swiotlb size: user-specified with swiotlb= or default.
-*
-* -swiotlb overflow buffer: now hardcoded to 32k. We round it
-* to 8M for other buffers that may need to stay low too. Also
-* make sure we allocate enough extra low memory so that we
-* don't run out of DMA buffers for 32-bit devices.
-*/
-   low_size = max(swiotlb_size_or_default() + (8UL << 

[PATCH v2 8/8] crash_core.c: remove unneeded functions

2023-08-29 Thread Baoquan He
So far, nobody calls functions parse_crashkernel_high() and
parse_crashkernel_low(), remove both of them.

Signed-off-by: Baoquan He 
---
 include/linux/crash_core.h |  4 
 kernel/crash_core.c| 18 --
 2 files changed, 22 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 86e22e6a039f..d64006c4bd43 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -83,10 +83,6 @@ void final_note(Elf_Word *buf);
 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base,
unsigned long long *low_size, bool *high);
-int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
-   unsigned long long *crash_size, unsigned long long *crash_base);
-int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
-   unsigned long long *crash_size, unsigned long long *crash_base);
 
 #ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
 #ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 6bc00cc390b5..61a8ea3b23a2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -323,24 +323,6 @@ int __init parse_crashkernel(char *cmdline,
return 0;
 }
 
-int __init parse_crashkernel_high(char *cmdline,
-unsigned long long system_ram,
-unsigned long long *crash_size,
-unsigned long long *crash_base)
-{
-   return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
-   suffix_tbl[SUFFIX_HIGH]);
-}
-
-int __init parse_crashkernel_low(char *cmdline,
-unsigned long long system_ram,
-unsigned long long *crash_size,
-unsigned long long *crash_base)
-{
-   return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
-   suffix_tbl[SUFFIX_LOW]);
-}
-
 /*
  * Add a dummy early_param handler to mark crashkernel= as a known command line
  * parameter and suppress incorrect warnings in init/main.c.
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 5/8] crash_core.h: include if generic reservation is needed

2023-08-29 Thread Baoquan He
In asm/crash_core.h, ARCH can provide its own macro definitions to
override DEFAULT_CRASH_KERNEL_LOW_SIZE, CRASH_ALIGN, CRASH_ADDR_LOW_MAX,
CRASH_ADDR_HIGH_MAX in  if needed.

Later, x86 and arm64 wil support the generic reservaton, so wrap the
including into CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery
scope to avoid compiling error in other ARCH-es which doesn't take the
generic reservation.

Signed-off-by: Baoquan He 
---
 include/linux/crash_core.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 2f732493e922..86e22e6a039f 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -5,6 +5,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
+#include 
+#endif
 
 #define CRASH_CORE_NOTE_NAME  "CORE"
 #define CRASH_CORE_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 2/8] crash_core: change the prototype of function parse_crashkernel()

2023-08-29 Thread Baoquan He
Add two parameters 'low_size' and 'high' to function parse_crashkernel(),
later crashkernel=,high|low parsing will be added. Make adjustments in all
call sites of parse_crashkernel() in arch.

Signed-off-by: Baoquan He 
---
 arch/arm/kernel/setup.c  |  3 ++-
 arch/arm64/mm/init.c |  2 +-
 arch/ia64/kernel/setup.c |  2 +-
 arch/loongarch/kernel/setup.c|  4 +++-
 arch/mips/kernel/setup.c |  3 ++-
 arch/powerpc/kernel/fadump.c |  2 +-
 arch/powerpc/kexec/core.c|  2 +-
 arch/powerpc/mm/nohash/kaslr_booke.c |  2 +-
 arch/riscv/mm/init.c |  2 +-
 arch/s390/kernel/setup.c |  4 ++--
 arch/sh/kernel/machine_kexec.c   |  2 +-
 arch/x86/kernel/setup.c  |  3 ++-
 include/linux/crash_core.h   |  3 ++-
 kernel/crash_core.c  | 15 ---
 14 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index c66b560562b3..e2bb7afd0683 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -1010,7 +1010,8 @@ static void __init reserve_crashkernel(void)
 
total_mem = get_total_mem();
ret = parse_crashkernel(boot_command_line, total_mem,
-   _size, _base);
+   _size, _base,
+   NULL, NULL);
/* invalid value specified or crashkernel=0 */
if (ret || !crash_size)
return;
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 4fcb88a445ef..4ad637508b75 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -142,7 +142,7 @@ static void __init reserve_crashkernel(void)
 
/* crashkernel=X[@offset] */
ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
-   _size, _base);
+   _size, _base, NULL, NULL);
if (ret == -ENOENT) {
ret = parse_crashkernel_high(cmdline, 0, _size, 
_base);
if (ret || !crash_size)
diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 5a55ac82c13a..4faea2d2cf07 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -277,7 +277,7 @@ static void __init setup_crashkernel(unsigned long total, 
int *n)
int ret;
 
ret = parse_crashkernel(boot_command_line, total,
-   , );
+   , , NULL, NULL);
if (ret == 0 && size > 0) {
if (!base) {
sort_regions(rsvd_region, *n);
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index 9d830ab4e302..776a068d8718 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -267,7 +267,9 @@ static void __init arch_parse_crashkernel(void)
unsigned long long crash_base, crash_size;
 
total_mem = memblock_phys_mem_size();
-   ret = parse_crashkernel(boot_command_line, total_mem, _size, 
_base);
+   ret = parse_crashkernel(boot_command_line, total_mem,
+   _size, _base,
+   NULL, NULL);
if (ret < 0 || crash_size <= 0)
return;
 
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index cb871eb784a7..08321c945ac4 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -460,7 +460,8 @@ static void __init mips_parse_crashkernel(void)
 
total_mem = memblock_phys_mem_size();
ret = parse_crashkernel(boot_command_line, total_mem,
-   _size, _base);
+   _size, _base,
+   NULL, NULL);
if (ret != 0 || crash_size <= 0)
return;
 
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index ea0a073abd96..7dbdeba56e74 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -313,7 +313,7 @@ static __init u64 fadump_calculate_reserve_size(void)
 * memory at a predefined offset.
 */
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
-   , );
+   , , NULL, NULL);
if (ret == 0 && size > 0) {
unsigned long max_size;
 
diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
index de64c7962991..9346c960b296 100644
--- a/arch/powerpc/kexec/core.c
+++ b/arch/powerpc/kexec/core.c
@@ -109,7 +109,7 @@ void __init reserve_crashkernel(void)
total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
/* use common parsing */
ret = parse_crashkernel(boot_command_line, total_mem_sz,
-   _size, _base);
+   _size, _base, NULL, NULL);
if (ret == 0 && crash_size > 0) {
crashk_res.start = crash_base;
crashk_res.end = crash_base + 

[PATCH v2 0/8] kdump: use generic functions to simplify crashkernel reservation in arch

2023-08-29 Thread Baoquan He
In the current arm64, crashkernel=,high support has been finished after
several rounds of posting and careful reviewing. The code in arm64 which
parses crashkernel kernel parameters firstly, then reserve memory can be
a good example for other ARCH to refer to.

Whereas in x86_64, the code mixing crashkernel parameter parsing and
memory reserving is twisted, and looks messy. Refactoring the code to
make it more readable maintainable is necessary.

Here, firstly abstract the crashkernel parameter parsing code into
parse_crashkernel() to make it be able to parse crashkernel=,high|low.
Then abstract the crashkernel memory reserving code into a generic
function reserve_crashkernel_generic(). Finally, in ARCH which
crashkernel=,high support is needed, a simple arch_reserve_crashkernel()
can be added to call above two functions. This can remove the duplicated
implmentation code in each ARCH, like arm64, x86_64.

I only change the arm64 and x86_64 implementation to make use of the
generic functions to simplify code. Risc-v can be done very easily refer
to the steps in arm64 and x86_64.

History:
- RFC patchset:
  https://lore.kernel.org/all/20230619055951.45620-1-...@redhat.com/T/#u
  [RFC PATCH 0/4] kdump: add generic functions to simplify crashkernel 
crashkernel in architecture
  Dave and Philipp commented the old parse_crashkernel_common() and
  parse_crashkernel_generic() are quite confusing. In this formal post,
  I made change to address the concern by unifying all crashkernel
  parsing into parse_crashkernel().

v1->v2:
- Change to add asm/crash_core.h into x86 and arm64 to contain those
  arch specific macro definitions for generic reservaton. This fixes the
  compiling error reported by LKP in v1.
  https://lore.kernel.org/all/202308272150.p3krkmof-...@intel.com/T/#u
- Fix a log typo in patch 8, thanks to Samuel.

Baoquan He (8):
  crash_core.c: remove unnecessary parameter of function
  crash_core: change the prototype of function parse_crashkernel()
  crash_core: change parse_crashkernel() to support
crashkernel=,high|low parsing
  crash_core: add generic function to do reservation
  crash_core.h: include  if generic reservation is
needed
  x86: kdump: use generic interface to simplify crashkernel reservation
code
  arm64: kdump: use generic interface to simplify crashkernel
reservation
  crash_core.c: remove unneeded functions

 arch/arm/kernel/setup.c  |   3 +-
 arch/arm64/Kconfig   |   3 +
 arch/arm64/include/asm/crash_core.h  |  10 ++
 arch/arm64/mm/init.c | 140 ++
 arch/ia64/kernel/setup.c |   2 +-
 arch/loongarch/kernel/setup.c|   4 +-
 arch/mips/kernel/setup.c |   3 +-
 arch/powerpc/kernel/fadump.c |   2 +-
 arch/powerpc/kexec/core.c|   2 +-
 arch/powerpc/mm/nohash/kaslr_booke.c |   2 +-
 arch/riscv/mm/init.c |   2 +-
 arch/s390/kernel/setup.c |   4 +-
 arch/sh/kernel/machine_kexec.c   |   2 +-
 arch/x86/Kconfig |   3 +
 arch/x86/include/asm/crash_core.h|  34 ++
 arch/x86/kernel/setup.c  | 143 +++---
 include/linux/crash_core.h   |  38 +-
 kernel/crash_core.c  | 170 +++
 18 files changed, 269 insertions(+), 298 deletions(-)
 create mode 100644 arch/arm64/include/asm/crash_core.h
 create mode 100644 arch/x86/include/asm/crash_core.h

-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 3/8] crash_core: change parse_crashkernel() to support crashkernel=,high|low parsing

2023-08-29 Thread Baoquan He
Now parse_crashkernel() is a real entry point for all kinds of
crahskernel parsing on any architecture.

And wrap the crahskernel=,high|low handling inside
CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope.

Signed-off-by: Baoquan He 
---
 include/linux/crash_core.h |  6 ++
 kernel/crash_core.c| 28 +++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 2e76289699ff..85260bf4a734 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -77,6 +77,12 @@ Elf_Word *append_elf_note(Elf_Word *buf, char *name, 
unsigned int type,
  void *data, size_t data_len);
 void final_note(Elf_Word *buf);
 
+#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
+#ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE
+#define DEFAULT_CRASH_KERNEL_LOW_SIZE  (128UL << 20)
+#endif
+#endif
+
 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base,
unsigned long long *low_size, bool *high);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index f6a5c219e2e1..355b0ab5189c 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -276,6 +276,9 @@ static int __init __parse_crashkernel(char *cmdline,
 /*
  * That function is the entry point for command line parsing and should be
  * called from the arch-specific code.
+ *
+ * If crashkernel=,high|low is supported on architecture, non-NULL values
+ * should be passed to parameters 'low_size' and 'high'.
  */
 int __init parse_crashkernel(char *cmdline,
 unsigned long long system_ram,
@@ -291,7 +294,30 @@ int __init parse_crashkernel(char *cmdline,
crash_base, NULL);
if (!high)
return ret;
-
+#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
+   else if (ret == -ENOENT) {
+   ret = __parse_crashkernel(cmdline, 0, crash_size,
+   crash_base, suffix_tbl[SUFFIX_HIGH]);
+   if (ret || !*crash_size)
+   return -1;
+
+   /*
+* crashkernel=Y,low can be specified or not, but invalid value
+* is not allowed.
+*/
+   ret = __parse_crashkernel(cmdline, 0, low_size,
+   crash_base, suffix_tbl[SUFFIX_LOW]);
+   if (ret == -ENOENT)
+   *low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE;
+   else if (ret)
+   return -1;
+
+   *high = true;
+   } else if (ret || !*crash_size) {
+   /* The specified value is invalid */
+   return -1;
+   }
+#endif
return 0;
 }
 
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 1/8] crash_core.c: remove unnecessary parameter of function

2023-08-29 Thread Baoquan He
In all call sites of __parse_crashkernel(), the parameter 'name' is
hardcoded as "crashkernel=". So remove the unnecessary parameter 'name',
add local varibale 'name' inside __parse_crashkernel() instead.

Signed-off-by: Baoquan He 
---
 kernel/crash_core.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..f27b4e45d410 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -241,11 +241,11 @@ static int __init __parse_crashkernel(char *cmdline,
 unsigned long long system_ram,
 unsigned long long *crash_size,
 unsigned long long *crash_base,
-const char *name,
 const char *suffix)
 {
char*first_colon, *first_space;
char*ck_cmdline;
+   char*name = "crashkernel=";
 
BUG_ON(!crash_size || !crash_base);
*crash_size = 0;
@@ -283,7 +283,7 @@ int __init parse_crashkernel(char *cmdline,
 unsigned long long *crash_base)
 {
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
-   "crashkernel=", NULL);
+   NULL);
 }
 
 int __init parse_crashkernel_high(char *cmdline,
@@ -292,7 +292,7 @@ int __init parse_crashkernel_high(char *cmdline,
 unsigned long long *crash_base)
 {
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
-   "crashkernel=", suffix_tbl[SUFFIX_HIGH]);
+   suffix_tbl[SUFFIX_HIGH]);
 }
 
 int __init parse_crashkernel_low(char *cmdline,
@@ -301,7 +301,7 @@ int __init parse_crashkernel_low(char *cmdline,
 unsigned long long *crash_base)
 {
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
-   "crashkernel=", suffix_tbl[SUFFIX_LOW]);
+   suffix_tbl[SUFFIX_LOW]);
 }
 
 /*
-- 
2.41.0


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -next v9 0/2] support allocating crashkernel above 4G explicitly on riscv

2023-08-29 Thread chenjiahao (C)

Hi folks,

This is a friendly ping. Anyone has additional comments on this patch set?

Best regards,
Jiahao

On 2023/7/27 1:49, Chen Jiahao wrote:

On riscv, the current crash kernel allocation logic is trying to
allocate within 32bit addressible memory region by default, if
failed, try to allocate without 4G restriction.

In need of saving DMA zone memory while allocating a relatively large
crash kernel region, allocating the reserved memory top down in
high memory, without overlapping the DMA zone, is a mature solution.
Hence this patchset introduces the parameter option crashkernel=X,[high,low].

One can reserve the crash kernel from high memory above DMA zone range
by explicitly passing "crashkernel=X,high"; or reserve a memory range
below 4G with "crashkernel=X,low". Besides, there are few rules need
to take notice:
1. "crashkernel=X,[high,low]" will be ignored if "crashkernel=size"
is specified.
2. "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed
and there is enough memory to be allocated under 4G.
3. When allocating crashkernel above 4G and no "crashkernel=X,low" is
specified, a 128M low memory will be allocated automatically for
swiotlb bounce buffer.
See Documentation/admin-guide/kernel-parameters.txt for more information.

To verify loading the crashkernel, adapted kexec-tools is attached below:
https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2

Following test cases have been performed as expected:
1) crashkernel=256M  //low=256M
2) crashkernel=1G//low=1G
3) crashkernel=4G//high=4G, low=128M(default)
4) crashkernel=4G crashkernel=256M,high  //high=4G, low=128M(default), high 
is ignored
5) crashkernel=4G crashkernel=256M,low   //high=4G, low=128M(default), low 
is ignored
6) crashkernel=4G,high   //high=4G, low=128M(default)
7) crashkernel=256M,low  //low=0M, invalid
8) crashkernel=4G,high crashkernel=256M,low  //high=4G, low=256M
9) crashkernel=4G,high crashkernel=4G,low//high=0M, low=0M, invalid
10) crashkernel=512M@0xd000  //low=512M
11) crashkernel=1G,high crashkernel=0M,low   //high=1G, low=0M

Changes since [v9]:
1. As per Conor's comment, rebase to correct base on riscv/for-next
branch. No code logic changed.

Changes since [v8]:
1. Rebase to newest mainline head, not modifying any code logic.

Changes since [v7]:
1. Minor refactor: move crash_low_size = DEFAULT_CRASH_KERNEL_LOW_SIZE
into the !high branch when the first allocation fails. Not changing
the result but further align with Arm64 logic, refer to Baoquan's
comment.
2. Add test case "crashkernel=1G,high crashkernel=0M,low", the result
also matches our expectation.

Changes since [v6]:
1. Introduce the "high" flag to mark whether "crashkernel=X,high"
is passed. Fix the retrying logic between "crashkernel=X,high"
case and others when the first allocation attempt fails.

Changes since [v5]:
1. Update the crashkernel allocation logic when crashkernel=X,high
is specified. In this case, region above 4G will directly get
reserved as crashkernel, rather than trying lower 32bit allocation
first.

Changes since [v4]:
1. Update some imprecise code comments for cmdline parsing.

Changes since [v3]:
1. Update to print warning and return explicitly on failure when
crashkernel=size@offset is specified. Not changing the result
in this case but making the logic more straightforward.
2. Some minor cleanup.

Changes since [v2]:
1. Update the allocation logic to ensure the high crashkernel
region is reserved strictly above dma32_phys_limit.
2. Clean up some minor format problems.

Chen Jiahao (2):
   riscv: kdump: Implement crashkernel=X,[high,low]
   docs: kdump: Update the crashkernel description for riscv

  .../admin-guide/kernel-parameters.txt | 15 +--
  arch/riscv/kernel/setup.c |  5 +
  arch/riscv/mm/init.c  | 93 +--
  3 files changed, 99 insertions(+), 14 deletions(-)



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec