Re: kexec reports "Cannot get kernel _text symbol address" on arm64 platform
On 08/12/23 at 07:11am, Baoquan He wrote: > On 08/11/23 at 01:27pm, Pandey, Radhey Shyam wrote: > > > -Original Message- > > > From: b...@redhat.com > > > Sent: Wednesday, August 9, 2023 7:42 AM > > > To: Pandey, Radhey Shyam ; > > > pi...@redhat.com > > > Cc: kexec@lists.infradead.org; linux-ker...@vger.kernel.org > > > Subject: Re: kexec reports "Cannot get kernel _text symbol address" on > > > arm64 platform > > > > > > On 08/08/23 at 07:17pm, Pandey, Radhey Shyam wrote: > > > > Hi, > > > > > > > > I am trying to bring up kdump on arm64 platform[1]. But I get "Cannot > > > > get > > > kernel _text symbol address". > > > > > > > > Is there some Dump-capture kernel config options that I am missing? > > > > > > > > FYI, copied below complete kexec debug log. > > > > > > > > [1]: https://www.xilinx.com/products/boards-and-kits/vck190.html > > > > > > Your description isn't clear. You saw the printing, then your kdump kernel > > > loading succeeded or not? > > > > > > If no, have you tried applying Pingfan's patchset and still saw the issue? > > > > > > [PATCHv7 0/5] arm64: zboot support > > > https://lore.kernel.org/all/20230803024152.11663-1-pi...@redhat.com/T/#u > > > > I was able to proceed further with loading with crash kernel on triggering > > system crash. > > echo c > /proc/sysrq-trigger > > > > But when I copy /proc/vmcore it throws memory abort. Also I see size of > > /proc/vmcore really huge (18446603353488633856). > > This is a better symptom description. > > It's very similar with a solved issue even though the calltrace is not > completely same, can you try below patch to see if it fix your problem? Oops, I was wrong. Below patch is irrelevant because it's a kcore issue, you met a vmcore issue, please ignore this. We need investigate to see what is happening. > > [PATCH] fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions > https://lore.kernel.org/all/20230731215021.70911-1-lstoa...@gmail.com/T/#u > > > Any possible guess on what could be wrong? > > > > > > [ 80.733523] Starting crashdump kernel... > > [ 80.737435] Bye! > > [0.00] Booting Linux on physical CPU 0x01 [0x410fd083] > > [0.00] Linux version 6.5.0-rc4-ge28001fb4e07 (radheys@xhdradheys41) > > (aarch64-xilinx-linux-gcc.real (GCC) 12.2.0, GNU ld (GNU Binutils) > > 2.39.0.20220819) #23 SMP Fri Aug 11 16:25:34 IST 2023 > > > > > > > > > > xilinx-vck190-20232:/run/media/mmcblk0p1# cat /proc/meminfo | head > > MemTotal:2092876 kB > > MemFree: 1219928 kB > > MemAvailable:1166004 kB > > Buffers: 32 kB > > Cached: 756952 kB > > SwapCached:0 kB > > Active: 1480 kB > > Inactive: 24164 kB > > Active(anon): 1452 kB > > Inactive(anon):24160 kB > > xilinx-vck190-20232:/run/media/mmcblk0p1# cp /proc/vmcore dump > > [ 975.284865] Unable to handle kernel level 3 address size fault at > > virtual address 80008d7cf000 > > [ 975.293871] Mem abort info: > > [ 975.296669] ESR = 0x9603 > > [ 975.300425] EC = 0x25: DABT (current EL), IL = 32 bits > > [ 975.305738] SET = 0, FnV = 0 > > [ 975.308788] EA = 0, S1PTW = 0 > > [ 975.311925] FSC = 0x03: level 3 address size fault > > [ 975.316888] Data abort info: > > [ 975.319763] ISV = 0, ISS = 0x0003, ISS2 = 0x > > [ 975.325245] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > [ 975.330292] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > > [ 975.335599] swapper pgtable: 4k pages, 48-bit VAs, pgdp=05016ef6b000 > > [ 975.342297] [80008d7cf000] pgd=1501eddfe003, > > p4d=1501eddfe003, pud=1501eddfd003, pmd=15017b695003, > > pte=00687fff84000703 > > [ 975.354827] Internal error: Oops: 9603 [#4] SMP > > [ 975.360392] Modules linked in: > > 3 975. > > 63440] CBPrUo:a d0c aPID: 664 Comm: cp Tainted: G D > > 6.5.0-rc4-ge28001fb4e07 #23 > > [ 975.372822] Hardware name: Xilinx Versal vck190 Eval board revA (DT) > > [ 975.379165] pstate: a005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS > > BTYPE=--) > > [ 975.386119] pc : __memcpy+0x110/0x230 > > [ 975.389783] lr : _copy_to_iter+0x3d8/0x4d0 > > [ 975.393874] sp : 80008dc939a0 > > [ 975.397178] x29: 80008dc939a0 x28: 05013c1bea30 x27: > > 1000 > > [ 975.404309] x26: 1000 x25: 1000 x24: > > 80008d7cf000 > > [ 975.411440] x23: 0400 x22: 80008dc93ba0 x21: > > 1000 > > [ 975.418570] x20: x19: 1000 x18: > > > > [ 975.425699] x17: x16: x15: > > 0140 > > [ 975.432829] x14: 8500a9919000 x13: 0041 x12: > > fffef6831000 > > [ 975.439958] x11: 80008d9cf000 x10: x9 : > > > > [ 975.447088] x8 : 80008d7d x7 : 0501addfd358 x6 : > > 0401 > > [ 975.454217] x5
Re: kexec reports "Cannot get kernel _text symbol address" on arm64 platform
On 08/11/23 at 01:27pm, Pandey, Radhey Shyam wrote: > > -Original Message- > > From: b...@redhat.com > > Sent: Wednesday, August 9, 2023 7:42 AM > > To: Pandey, Radhey Shyam ; > > pi...@redhat.com > > Cc: kexec@lists.infradead.org; linux-ker...@vger.kernel.org > > Subject: Re: kexec reports "Cannot get kernel _text symbol address" on > > arm64 platform > > > > On 08/08/23 at 07:17pm, Pandey, Radhey Shyam wrote: > > > Hi, > > > > > > I am trying to bring up kdump on arm64 platform[1]. But I get "Cannot get > > kernel _text symbol address". > > > > > > Is there some Dump-capture kernel config options that I am missing? > > > > > > FYI, copied below complete kexec debug log. > > > > > > [1]: https://www.xilinx.com/products/boards-and-kits/vck190.html > > > > Your description isn't clear. You saw the printing, then your kdump kernel > > loading succeeded or not? > > > > If no, have you tried applying Pingfan's patchset and still saw the issue? > > > > [PATCHv7 0/5] arm64: zboot support > > https://lore.kernel.org/all/20230803024152.11663-1-pi...@redhat.com/T/#u > > I was able to proceed further with loading with crash kernel on triggering > system crash. > echo c > /proc/sysrq-trigger > > But when I copy /proc/vmcore it throws memory abort. Also I see size of > /proc/vmcore really huge (18446603353488633856). This is a better symptom description. It's very similar with a solved issue even though the calltrace is not completely same, can you try below patch to see if it fix your problem? [PATCH] fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions https://lore.kernel.org/all/20230731215021.70911-1-lstoa...@gmail.com/T/#u > Any possible guess on what could be wrong? > > > [ 80.733523] Starting crashdump kernel... > [ 80.737435] Bye! > [0.00] Booting Linux on physical CPU 0x01 [0x410fd083] > [0.00] Linux version 6.5.0-rc4-ge28001fb4e07 (radheys@xhdradheys41) > (aarch64-xilinx-linux-gcc.real (GCC) 12.2.0, GNU ld (GNU Binutils) > 2.39.0.20220819) #23 SMP Fri Aug 11 16:25:34 IST 2023 > > > > > xilinx-vck190-20232:/run/media/mmcblk0p1# cat /proc/meminfo | head > MemTotal:2092876 kB > MemFree: 1219928 kB > MemAvailable:1166004 kB > Buffers: 32 kB > Cached: 756952 kB > SwapCached:0 kB > Active: 1480 kB > Inactive: 24164 kB > Active(anon): 1452 kB > Inactive(anon):24160 kB > xilinx-vck190-20232:/run/media/mmcblk0p1# cp /proc/vmcore dump > [ 975.284865] Unable to handle kernel level 3 address size fault at virtual > address 80008d7cf000 > [ 975.293871] Mem abort info: > [ 975.296669] ESR = 0x9603 > [ 975.300425] EC = 0x25: DABT (current EL), IL = 32 bits > [ 975.305738] SET = 0, FnV = 0 > [ 975.308788] EA = 0, S1PTW = 0 > [ 975.311925] FSC = 0x03: level 3 address size fault > [ 975.316888] Data abort info: > [ 975.319763] ISV = 0, ISS = 0x0003, ISS2 = 0x > [ 975.325245] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > [ 975.330292] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 975.335599] swapper pgtable: 4k pages, 48-bit VAs, pgdp=05016ef6b000 > [ 975.342297] [80008d7cf000] pgd=1501eddfe003, p4d=1501eddfe003, > pud=1501eddfd003, pmd=15017b695003, pte=00687fff84000703 > [ 975.354827] Internal error: Oops: 9603 [#4] SMP > [ 975.360392] Modules linked in: > 3 975. > 63440] CBPrUo:a d0c aPID: 664 Comm: cp Tainted: G D > 6.5.0-rc4-ge28001fb4e07 #23 > [ 975.372822] Hardware name: Xilinx Versal vck190 Eval board revA (DT) > [ 975.379165] pstate: a005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 975.386119] pc : __memcpy+0x110/0x230 > [ 975.389783] lr : _copy_to_iter+0x3d8/0x4d0 > [ 975.393874] sp : 80008dc939a0 > [ 975.397178] x29: 80008dc939a0 x28: 05013c1bea30 x27: > 1000 > [ 975.404309] x26: 1000 x25: 1000 x24: > 80008d7cf000 > [ 975.411440] x23: 0400 x22: 80008dc93ba0 x21: > 1000 > [ 975.418570] x20: x19: 1000 x18: > > [ 975.425699] x17: x16: x15: > 0140 > [ 975.432829] x14: 8500a9919000 x13: 0041 x12: > fffef6831000 > [ 975.439958] x11: 80008d9cf000 x10: x9 : > > [ 975.447088] x8 : 80008d7d x7 : 0501addfd358 x6 : > 0401 > [ 975.454217] x5 : 0501370e9000 x4 : 80008d7d x3 : > > [ 975.461346] x2 : 1000 x1 : 80008d7cf000 x0 : > 0501370e8000 > [ 975.468476] Call trace: > [ 975.470912] __memcpy+0x110/0x230 > [ 975.474221] copy_oldmem_page+0x70/0xac > [ 975.478050] read_from_oldmem.part.0+0x120/0x188 > [ 975.482663] read_vmcore+0x14c/0x238 > [ 975.486231] proc_reg_read_iter+0x84/0xd8 > [ 975.490233]
Re: [RFC] IMA Log Snapshotting Design Proposal
On 8/11/23 11:57, Tushar Sugandhi wrote: [1] https://patchwork.kernel.org/project/linux-integrity/cover/20230801181917.8535-1-tusha...@linux.microsoft.com/ The shards should will need to be written into some sort of standard location or a config file needs to be defined, so that everyone knows where to find them and how they are named. We thought about well known standard location earlier. Letting the Kernel choose the name/location of the snapshot file comes with its own complexity. Our initial stance is we don’t want to handle that at Kernel level, and let the UM client choose the location/naming of the snapshot files. But we are happy to reconsider if the community requests it. I would also let user space do the snapshotting but all applications relying on shards should know where they are located on the system and what the naming scheme is so they can be process in proper order. evmctl for example would have to know where the shards are if keylime agent had taken snapshots. Yes. If the “PCR quotes in the snapshot_aggregate event in IMA log” PCR quote or 'quotes'? Why multiple? Form your proposal but you may have changed your opinion following what I see in other messages: "- The Kernel will get the current TPM PCR values and PCR update counter [2] and store them as template data in a new IMA event "snapshot_aggregate"." Afaik TPM quote's don't give you the state of the individual PCR values, therefore I would expect to at least find the 'PCR values' of all the PCRs that IMA touched to be in the snapshot_aggregate so I can replay all the following events on top of these PCR values and come up with the values that were used in the "final PCR quote". This is unless you expect the server to take an automatic snapshot of the values of the PCRs that it computed while evaluating the log in case it ever needs to go back. I meant a single set of PCR values captured when snapshot_aggregate is logged. Sorry for the confusion. Ok. + "replay of rest of the events in IMA log" results in the “final PCR quotes” that matches with the “AK signed PCR quotes” sent by the client, then the truncated IMA log can be trusted. The verifier can either ‘trust’ the “PCR quotes in the snapshot_aggregate event in IMA log” or it can ask for the (n-1)th snapshot shard to check the past events. For anything regarding determining the 'trustworthiness of a system' one would have to be able to go back to the very beginning of the log *or* remember in what state a system was when the latest snapshot was taken so that if a restart happens it can resume with that assumption about state of trustworthiness and know what the values of the PCRs were at that time so it can resume replaying the log (or the server would get these values from the log). Correct. We intend to support the above. I hope our proposal description captures it. BTW, when you say ‘restart’, you mean the UM process restart, right? Because in case of a Kernel restart Yes, client restart not reboot. (i.e. cold-boot) the past IMA log (and the TPM state) is lost, and old snapshots (if any) are useless. Right. Some script should run on boot and delete all contents of the directory where the log shards are. The AK quotes by the kernel (which adds a 2nd AK key) that James is proposing could be useful if the entire log, consisting of multiple shards, is very large and cannot be transferred from the client to the server in one go so that the server could evaluate the 'final PCR quote' immediately . However, if a client can indicated 'I will send more the next time and I have this much more to transfer' and the server allows this multiple times (until all the 1MB shards of the 20MB log are transferred) then that kernel AK key would not be necessary since presumably the "final PCR quote", created by a user space client, would resolve whether the entire log is trustworthy. See my responses to James today [2] [2] https://lore.kernel.org/all/72e39852-1ff1-c7f6-ac7e-593e8142d...@linux.microsoft.com/ I think James was proposing one AK, possibly persisted in the TPM's NVRAM. Still, the less keys that are involved in this the better... Stefan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v27 2/8] crash: add generic infrastructure for crash hotplug support
To support crash hotplug, a mechanism is needed to update the crash elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/ onlining). The crash elfcorehdr describes the CPUs and memory to be written into the vmcore. To track CPU changes, callbacks are registered with the cpuhp mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The crash hotplug elfcorehdr update has no explicit ordering requirement (relative to other cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a new state for crash hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE group, just prior to the STARTING group, which is very close to the CPU starting up in a plug/online situation, or stopping in a unplug/ offline situation. This minimizes the window of time during an actual plug/online or unplug/offline situation in which the elfcorehdr would be inaccurate. Note that for a CPU being unplugged or offlined, the CPU will still be present in the list of CPUs generated by crash_prepare_elf64_headers(). However, there is no need to explicitly omit the CPU, see justification in 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'. To track memory changes, a notifier is registered to capture the memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier(). The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event() which performs needed tasks and then dispatches the event to the architecture specific arch_crash_handle_hotplug_event() to update the elfcorehdr with the current state of CPUs and memory. During the process, the kexec_lock is held. Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He --- include/linux/crash_core.h | 9 +++ include/linux/kexec.h | 11 +++ kernel/Kconfig.kexec | 31 kernel/crash_core.c| 142 + kernel/kexec_core.c| 6 ++ 5 files changed, 199 insertions(+) diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index de62a722431e..e14345cc7a22 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram, int parse_crashkernel_low(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base); +#define KEXEC_CRASH_HP_NONE0 +#define KEXEC_CRASH_HP_ADD_CPU 1 +#define KEXEC_CRASH_HP_REMOVE_CPU 2 +#define KEXEC_CRASH_HP_ADD_MEMORY 3 +#define KEXEC_CRASH_HP_REMOVE_MEMORY 4 +#define KEXEC_CRASH_HP_INVALID_CPU -1U + +struct kimage; + #endif /* LINUX_CRASH_CORE_H */ diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 811a90e09698..b9903dd48e24 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes; #include #include #include +#include #include /* Verify architecture specific macros are defined */ @@ -360,6 +361,12 @@ struct kimage { struct purgatory_info purgatory_info; #endif +#ifdef CONFIG_CRASH_HOTPLUG + int hp_action; + int elfcorehdr_index; + bool elfcorehdr_updated; +#endif + #ifdef CONFIG_IMA_KEXEC /* Virtual address of IMA measurement buffer for kexec syscall */ void *ima_buffer; @@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { } #endif +#ifndef arch_crash_handle_hotplug_event +static inline void arch_crash_handle_hotplug_event(struct kimage *image) { } +#endif + #else /* !CONFIG_KEXEC_CORE */ struct pt_regs; struct task_struct; diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec index ff72e45cfaef..d0a9a5392035 100644 --- a/kernel/Kconfig.kexec +++ b/kernel/Kconfig.kexec @@ -113,4 +113,35 @@ config CRASH_DUMP For s390, this option also enables zfcpdump. See also +config CRASH_HOTPLUG + bool "Update the crash elfcorehdr on system configuration changes" + default y + depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG) + depends on ARCH_SUPPORTS_CRASH_HOTPLUG + help + Enable direct update to the crash elfcorehdr (which contains + the list of CPUs and memory regions to be dumped upon a crash) + in response to hot plug/unplug or online/offline of CPUs or + memory. This is a much more advanced approach than userspace + attempting that. + + If unsure, say Y. + +config CRASH_MAX_MEMORY_RANGES + int "Specify the maximum number of memory regions for the elfcorehdr" + default 8192 + depends on CRASH_HOTPLUG + help + For the kexec_file_load()
[PATCH v27 0/8] crash: Kernel handling of CPU and memory hot un/plug
This series is dependent upon "refactor Kconfig to consolidate KEXEC and CRASH options". https://lore.kernel.org/lkml/20230712161545.87870-1-eric.devol...@oracle.com/ Once the kdump service is loaded, if changes to CPUs or memory occur, either by hot un/plug or off/onlining, the crash elfcorehdr must also be updated. The elfcorehdr describes to kdump the CPUs and memory in the system, and any inaccuracies can result in a vmcore with missing CPU context or memory regions. The current solution utilizes udev to initiate an unload-then-reload of the kdump image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by the userspace kexec utility. In the original post I outlined the significant performance problems related to offloading this activity to userspace. This patchset introduces a generic crash handler that registers with the CPU and memory notifiers. Upon CPU or memory changes, from either hot un/plug or off/onlining, this generic handler is invoked and performs important housekeeping, for example obtaining the appropriate lock, and then invokes an architecture specific handler to do the appropriate elfcorehdr update. Note the description in patch 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that enables further optimizations related to CPU plug/unplug/online/offline performance of elfcorehdr updates. In the case of x86_64, the arch specific handler generates a new elfcorehdr, and overwrites the old one in memory; thus no involvement with userspace needed. To realize the benefits/test this patchset, one must make a couple of minor changes to userspace: - Prevent udev from updating kdump crash kernel on hot un/plug changes. Add the following as the first lines to the RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" With this changeset applied, the two rules evaluate to false for CPU and memory change events and thus skip the userspace unload-then-reload of kdump. - Change to the kexec_file_load for loading the kdump kernel: Eg. on RHEL: in /usr/bin/kdumpctl, change to: standard_kexec_args="-p -d -s" which adds the -s to select kexec_file_load() syscall. This kernel patchset also supports kexec_load() with a modified kexec userspace utility. A working changeset to the kexec userspace utility is posted to the kexec-tools mailing list here: http://lists.infradead.org/pipermail/kexec/2023-May/027049.html To use the kexec-tools patch, apply, build and install kexec-tools, then change the kdumpctl's standard_kexec_args to replace the -s with --hotplug. The removal of -s reverts to the kexec_load syscall and the addition of --hotplug invokes the changes put forth in the kexec-tools patch. Regards, eric --- v27: 11aug2023 - Rebased onto 6.5.0-rc5 - The linux-next and akpm test bots found a build error when just PROC_KCORE is configured (with no KEXEC or CRASH), which resulted in CRASH_CORE enabled by itself. To solve, the struct crash_mem moved from include/linux/kexec.h to include/linux/crash_core.h. Similarly, the crash_notes also moved from kernel/kexec.c to kernel/crash_core.c. - Minor adjustment to arch/x86/kernel/crash.c was also needed to avoid unused function build errors for just PROC_KCORE. - Spot testing of several architectures did not reveal any further build problems (PROC_KCORE, KEXEC, CRASH_DUMP, CRASH_HOTPLUG). v26: 4aug2023 https://lore.kernel.org/lkml/20230804210359.8321-1-eric.devol...@oracle.com/ - Rebased onto 6.5.0-rc4 - Dropped the refactor of files drivers/base/cpu|memory.c as unrelated to this series. - Minor corrections to documentation, per Randy Dunlap and GregKH. v25: 29jun2023 https://lore.kernel.org/lkml/20230629192119.6613-1-eric.devol...@oracle.com/ - Properly applied IS_ENABLED() to the function bodies of callbacks in drivers/base/cpu|memory.c. - Re-ran compile and run-time testing of the impacted attributes for both enabled and not enabled config settings. v24: 28jun2023 https://lore.kernel.org/lkml/20230628185215.40707-1-eric.devol...@oracle.com/ - Rebased onto 6.4.0 - Included Documentation/ABI/testing entries for the new sysfs crash_hotplug attributes, per Greg Kroah-Hartman. - Refactored drivers/base/cpu|memory.c to use the .is_visible() method for attributes, per Greg Kroah-Hartman. - Retained all existing Acks and RBs as the few changes as a result of Greg's requests were trivial. v23: 12jun2023 https://lore.kernel.org/lkml/20230612210712.683175-1-eric.devol...@oracle.com/ - Rebased onto 6.4.0-rc6 - Refactored Kconfig, per Thomas. See series: https://lore.kernel.org/lkml/20230612172805.681179-1-eric.devol...@oracle.com/ - Reworked commit messages to conform to
[PATCH v27 8/8] x86/crash: optimize CPU changes
crash_prepare_elf64_headers() writes into the elfcorehdr an ELF PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs (ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr. The kimage->file_mode term covers kdump images loaded via the kexec_file_load() syscall. Since crash_prepare_elf64_headers() wrote the initial elfcorehdr, no update to the elfcorehdr is needed for CPU changes. The kimage->elfcorehdr_updated term covers kdump images loaded via the kexec_load() syscall. At least one memory or CPU change must occur to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr. Afterwards, no update to the elfcorehdr is needed for CPU changes. This code is intentionally *NOT* hoisted into crash_handle_hotplug_event() as it would prevent the arch-specific handler from running for CPU changes. This would break PPC, for example, which needs to update other information besides the elfcorehdr, on CPU changes. Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He --- arch/x86/kernel/crash.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index caf22bcb61af..18d2a18d1073 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -467,6 +467,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image) unsigned long mem, memsz; unsigned long elfsz = 0; + /* +* As crash_prepare_elf64_headers() has already described all +* possible CPUs, there is no need to update the elfcorehdr +* for additional CPU changes. +*/ + if ((image->file_mode || image->elfcorehdr_updated) && + ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) || + (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU))) + return; + /* * Create the new elfcorehdr reflecting the changes to CPU and/or * memory resources. -- 2.31.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v27 7/8] crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
The function crash_prepare_elf64_headers() generates the elfcorehdr which describes the CPUs and memory in the system for the crash kernel. In particular, it writes out ELF PT_NOTEs for memory regions and the CPUs in the system. With respect to the CPUs, the current implementation utilizes for_each_present_cpu() which means that as CPUs are added and removed, the elfcorehdr must again be updated to reflect the new set of CPUs. The reasoning behind the move to use for_each_possible_cpu(), is: - At kernel boot time, all percpu crash_notes are allocated for all possible CPUs; that is, crash_notes are not allocated dynamically when CPUs are plugged/unplugged. Thus the crash_notes for each possible CPU are always available. - The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU. Changing to for_each_possible_cpu() is valid as the crash_notes pointed to by each CPU PT_NOTE are present and always valid. Furthermore, examining a common crash processing path of: kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer elfcorehdr /proc/vmcore vmcore reveals how the ELF CPU PT_NOTEs are utilized: - Upon panic, each CPU is sent an IPI and shuts itself down, recording its state in its crash_notes. When all CPUs are shutdown, the crash kernel is launched with a pointer to the elfcorehdr. - The crash kernel via linux/fs/proc/vmcore.c does not examine or use the contents of the PT_NOTEs, it exposes them via /proc/vmcore. - The makedumpfile utility uses /proc/vmcore and reads the CPU PT_NOTEs to craft a nr_cpus variable, which is reported in a header but otherwise generally unused. Makedumpfile creates the vmcore. - The 'crash' dump analyzer does not appear to reference the CPU PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask symbols and directly examines those structure contents from vmcore memory. From that information it is able to determine which CPUs are present and online, and locate the corresponding crash_notes. Said differently, it appears that 'crash' analyzer does not rely on the ELF PT_NOTEs for CPUs; rather it obtains the information directly via kernel symbols and the memory within the vmcore. (There maybe other vmcore generating and analysis tools that do use these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most common solution.) This results in the benefit of having all CPUs described in the elfcorehdr, and therefore reducing the need to re-generate the elfcorehdr on CPU changes, at the small expense of an additional 56 bytes per PT_NOTE for not-present-but-possible CPUs. On systems where kexec_file_load() syscall is utilized, all the above is valid. On systems where kexec_load() syscall is utilized, there may be the need for the elfcorehdr to be regenerated once. The reason being that some archs only populate the 'present' CPUs from the /sys/devices/system/cpus entries, which the userspace 'kexec' utility uses to generate the userspace-supplied elfcorehdr. In this situation, one memory or CPU change will rewrite the elfcorehdr via the crash_prepare_elf64_headers() function and now all possible CPUs will be described, just as with kexec_file_load() syscall. Suggested-by: Sourabh Jain Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He --- kernel/crash_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index fa918176d46d..7378b501fada 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, ehdr->e_ehsize = sizeof(Elf64_Ehdr); ehdr->e_phentsize = sizeof(Elf64_Phdr); - /* Prepare one phdr of type PT_NOTE for each present CPU */ - for_each_present_cpu(cpu) { + /* Prepare one phdr of type PT_NOTE for each possible CPU */ + for_each_possible_cpu(cpu) { phdr->p_type = PT_NOTE; notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu)); phdr->p_offset = phdr->p_paddr = notes_addr; -- 2.31.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v27 5/8] x86/crash: add x86 crash hotplug support
When CPU or memory is hot un/plugged, or off/onlined, the crash elfcorehdr, which describes the CPUs and memory in the system, must also be updated. A new elfcorehdr is generated from the available CPUs and memory and replaces the existing elfcorehdr. The segment containing the elfcorehdr is identified at run-time in crash_core:crash_handle_hotplug_event(). No modifications to purgatory (see 'kexec: exclude elfcorehdr from the segment digest') or boot_params (as the elfcorehdr= capture kernel command line parameter pointer remains unchanged and correct) are needed, just elfcorehdr. For kexec_file_load(), the elfcorehdr segment size is based on NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a growing number of CPU and memory resources. For kexec_load(), the userspace kexec utility needs to size the elfcorehdr segment in the same/similar manner. To accommodate kexec_load() syscall in the absence of kexec_file_load() syscall support, prepare_elf_headers() and dependents are moved outside of CONFIG_KEXEC_FILE. Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He --- arch/x86/Kconfig | 3 + arch/x86/include/asm/kexec.h | 15 + arch/x86/kernel/crash.c | 103 --- 3 files changed, 114 insertions(+), 7 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7082fc10b346..ffc95c3d6abd 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2069,6 +2069,9 @@ config ARCH_SUPPORTS_KEXEC_JUMP config ARCH_SUPPORTS_CRASH_DUMP def_bool X86_64 || (X86_32 && HIGHMEM) +config ARCH_SUPPORTS_CRASH_HOTPLUG + def_bool y + config PHYSICAL_START hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP) default "0x100" diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 5b77bbc28f96..9143100ea3ea 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void); extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss; extern void kdump_nmi_shootdown_cpus(void); +#ifdef CONFIG_CRASH_HOTPLUG +void arch_crash_handle_hotplug_event(struct kimage *image); +#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event + +#ifdef CONFIG_HOTPLUG_CPU +static inline int crash_hotplug_cpu_support(void) { return 1; } +#define crash_hotplug_cpu_support crash_hotplug_cpu_support +#endif + +#ifdef CONFIG_MEMORY_HOTPLUG +static inline int crash_hotplug_memory_support(void) { return 1; } +#define crash_hotplug_memory_support crash_hotplug_memory_support +#endif +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_KEXEC_H */ diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index cdd92ab43cda..c70a111c44fa 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -158,8 +158,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs) crash_save_cpu(regs, safe_smp_processor_id()); } -#ifdef CONFIG_KEXEC_FILE - static int get_nr_ram_ranges_callback(struct resource *res, void *arg) { unsigned int *nr_ranges = arg; @@ -231,7 +229,7 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg) /* Prepare elf headers. Return addr and size */ static int prepare_elf_headers(struct kimage *image, void **addr, - unsigned long *sz) + unsigned long *sz, unsigned long *nr_mem_ranges) { struct crash_mem *cmem; int ret; @@ -249,6 +247,9 @@ static int prepare_elf_headers(struct kimage *image, void **addr, if (ret) goto out; + /* Return the computed number of memory ranges, for hotplug usage */ + *nr_mem_ranges = cmem->nr_ranges; + /* By default prepare 64bit headers */ ret = crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz); @@ -257,6 +258,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr, return ret; } +#ifdef CONFIG_KEXEC_FILE static int add_e820_entry(struct boot_params *params, struct e820_entry *entry) { unsigned int nr_e820_entries; @@ -371,18 +373,42 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) int crash_load_segments(struct kimage *image) { int ret; + unsigned long pnum = 0; struct kexec_buf kbuf = { .image = image, .buf_min = 0, .buf_max = ULONG_MAX, .top_down = false }; /* Prepare elf headers and add a segment */ - ret = prepare_elf_headers(image, , ); + ret = prepare_elf_headers(image, , , ); if (ret) return ret; - image->elf_headers = kbuf.buffer; - image->elf_headers_sz = kbuf.bufsz; + image->elf_headers = kbuf.buffer; + image->elf_headers_sz = kbuf.bufsz; + kbuf.memsz =
[PATCH v27 3/8] kexec: exclude elfcorehdr from the segment digest
When a crash kernel is loaded via the kexec_file_load() syscall, the kernel places the various segments (ie crash kernel, crash initrd, boot_params, elfcorehdr, purgatory, etc) in memory. For those architectures that utilize purgatory, a hash digest of the segments is calculated for integrity checking. The digest is embedded into the purgatory image prior to placing in memory. Updates to the elfcorehdr in response to CPU and memory changes would cause the purgatory integrity checking to fail (at crash time, and no vmcore created). Therefore, the elfcorehdr segment is explicitly excluded from the purgatory digest, enabling updates to the elfcorehdr while also avoiding the need to recompute the hash digest and reload purgatory. Suggested-by: Baoquan He Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He --- kernel/kexec_file.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 453b7a513540..e2ec9d7b9a1f 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -726,6 +726,12 @@ static int kexec_calculate_store_digests(struct kimage *image) for (j = i = 0; i < image->nr_segments; i++) { struct kexec_segment *ksegment; +#ifdef CONFIG_CRASH_HOTPLUG + /* Exclude elfcorehdr segment to allow future changes via hotplug */ + if (j == image->elfcorehdr_index) + continue; +#endif + ksegment = >segment[i]; /* * Skip purgatory as it will be modified once we put digest -- 2.31.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH v27 4/8] crash: memory and CPU hotplug sysfs attributes
Introduce the crash_hotplug attribute for memory and CPUs for use by userspace. These attributes directly facilitate the udev rule for managing userspace re-loading of the crash kernel upon hot un/plug changes. For memory, expose the crash_hotplug attribute to the /sys/devices/system/memory directory. For example: # udevadm info --attribute-walk /sys/devices/system/memory/memory81 looking at device '/devices/system/memory/memory81': KERNEL=="memory81" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="0051" ATTR{removable}=="1" ATTR{state}=="online" ATTR{valid_zones}=="Movable" looking at parent device '/devices/system/memory': KERNELS=="memory" SUBSYSTEMS=="" DRIVERS=="" ATTRS{auto_online_blocks}=="offline" ATTRS{block_size_bytes}=="800" ATTRS{crash_hotplug}=="1" For CPUs, expose the crash_hotplug attribute to the /sys/devices/system/cpu directory. For example: # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0 looking at device '/devices/system/cpu/cpu0': KERNEL=="cpu0" SUBSYSTEM=="cpu" DRIVER=="processor" ATTR{crash_notes}=="277c38600" ATTR{crash_notes_size}=="368" ATTR{online}=="1" looking at parent device '/devices/system/cpu': KERNELS=="cpu" SUBSYSTEMS=="" DRIVERS=="" ATTRS{crash_hotplug}=="1" ATTRS{isolated}=="" ATTRS{kernel_max}=="8191" ATTRS{nohz_full}==" (null)" ATTRS{offline}=="4-7" ATTRS{online}=="0-3" ATTRS{possible}=="0-7" ATTRS{present}=="0-3" With these sysfs attributes in place, it is possible to efficiently instruct the udev rule to skip crash kernel reloading for kernels configured with crash hotplug support. For example, the following is the proposed udev rule change for RHEL system 98-kexec.rules (as the first lines of the rule file): # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" When examined in the context of 98-kexec.rules, the above rules test if crash_hotplug is set, and if so, the userspace initiated unload-then-reload of the crash kernel is skipped. CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options. If an architecture supports, for example, memory hotplug but not CPU hotplug, then the /sys/devices/system/memory/crash_hotplug attribute file is present, but the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be present. Thus the udev rule skips userspace processing of memory hot un/plug events, but the udev rule will evaluate false for CPU events, thus allowing userspace to process CPU hot un/plug events (ie the unload-then-reload of the kdump capture kernel). Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He --- Documentation/ABI/testing/sysfs-devices-memory | 8 .../ABI/testing/sysfs-devices-system-cpu | 8 .../admin-guide/mm/memory-hotplug.rst | 8 Documentation/core-api/cpu_hotplug.rst | 18 ++ drivers/base/cpu.c | 13 + drivers/base/memory.c | 13 + include/linux/kexec.h | 8 7 files changed, 76 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory index d8b0f80b9e33..a95e0f17c35a 100644 --- a/Documentation/ABI/testing/sysfs-devices-memory +++ b/Documentation/ABI/testing/sysfs-devices-memory @@ -110,3 +110,11 @@ Description: link is created for memory section 9 on node0. /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 + +What: /sys/devices/system/memory/crash_hotplug +Date: Aug 2023 +Contact: Linux kernel mailing list +Description: + (RO) indicates whether or not the kernel directly supports + modifying the crash elfcorehdr for memory hot un/plug and/or + on/offline changes. diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 77942eedf4f6..b52564de2b18 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -687,3 +687,11 @@ Description: (RO) the list of CPUs that are isolated and don't participate in load balancing. These CPUs are set by boot parameter "isolcpus=". + +What: /sys/devices/system/cpu/crash_hotplug +Date: Aug 2023 +Contact: Linux kernel mailing list +Description: + (RO) indicates whether or not the kernel directly supports + modifying the crash elfcorehdr
[PATCH v27 1/8] crash: move a few code bits to setup support of crash hotplug
The crash hotplug support leans on the work for the kexec_file_load() syscall. To also support the kexec_load() syscall, a few bits of code need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are moved out of kexec_file.c and into a common location crash_core.c. No functionality change intended. Signed-off-by: Eric DeVolder Reviewed-by: Sourabh Jain Acked-by: Hari Bathini Acked-by: Baoquan He --- include/linux/kexec.h | 30 +++ kernel/crash_core.c | 182 ++ kernel/kexec_file.c | 181 - 3 files changed, 197 insertions(+), 196 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 22b5cd24f581..811a90e09698 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -105,6 +105,21 @@ struct compat_kexec_segment { }; #endif +/* Alignment required for elf header segment */ +#define ELF_CORE_HEADER_ALIGN 4096 + +struct crash_mem { + unsigned int max_nr_ranges; + unsigned int nr_ranges; + struct range ranges[]; +}; + +extern int crash_exclude_mem_range(struct crash_mem *mem, + unsigned long long mstart, + unsigned long long mend); +extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, + void **addr, unsigned long *sz); + #ifdef CONFIG_KEXEC_FILE struct purgatory_info { /* @@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf) } #endif -/* Alignment required for elf header segment */ -#define ELF_CORE_HEADER_ALIGN 4096 - -struct crash_mem { - unsigned int max_nr_ranges; - unsigned int nr_ranges; - struct range ranges[]; -}; - -extern int crash_exclude_mem_range(struct crash_mem *mem, - unsigned long long mstart, - unsigned long long mend); -extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, - void **addr, unsigned long *sz); - #ifndef arch_kexec_apply_relocations_add /* * arch_kexec_apply_relocations_add - apply relocations of type RELA diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 90ce1dfd591c..b7c30b748a16 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg) } early_param("crashkernel", parse_crashkernel_dummy); +int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, + void **addr, unsigned long *sz) +{ + Elf64_Ehdr *ehdr; + Elf64_Phdr *phdr; + unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz; + unsigned char *buf; + unsigned int cpu, i; + unsigned long long notes_addr; + unsigned long mstart, mend; + + /* extra phdr for vmcoreinfo ELF note */ + nr_phdr = nr_cpus + 1; + nr_phdr += mem->nr_ranges; + + /* +* kexec-tools creates an extra PT_LOAD phdr for kernel text mapping +* area (for example, 8000 - a000 on x86_64). +* I think this is required by tools like gdb. So same physical +* memory will be mapped in two ELF headers. One will contain kernel +* text virtual addresses and other will have __va(physical) addresses. +*/ + + nr_phdr++; + elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr); + elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN); + + buf = vzalloc(elf_sz); + if (!buf) + return -ENOMEM; + + ehdr = (Elf64_Ehdr *)buf; + phdr = (Elf64_Phdr *)(ehdr + 1); + memcpy(ehdr->e_ident, ELFMAG, SELFMAG); + ehdr->e_ident[EI_CLASS] = ELFCLASS64; + ehdr->e_ident[EI_DATA] = ELFDATA2LSB; + ehdr->e_ident[EI_VERSION] = EV_CURRENT; + ehdr->e_ident[EI_OSABI] = ELF_OSABI; + memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD); + ehdr->e_type = ET_CORE; + ehdr->e_machine = ELF_ARCH; + ehdr->e_version = EV_CURRENT; + ehdr->e_phoff = sizeof(Elf64_Ehdr); + ehdr->e_ehsize = sizeof(Elf64_Ehdr); + ehdr->e_phentsize = sizeof(Elf64_Phdr); + + /* Prepare one phdr of type PT_NOTE for each present CPU */ + for_each_present_cpu(cpu) { + phdr->p_type = PT_NOTE; + notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu)); + phdr->p_offset = phdr->p_paddr = notes_addr; + phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t); + (ehdr->e_phnum)++; + phdr++; + } + + /* Prepare one PT_NOTE header for vmcoreinfo */ + phdr->p_type = PT_NOTE; + phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note(); +
[PATCH v27 6/8] crash: hotplug support for kexec_load()
The hotplug support for kexec_load() requires changes to the userspace kexec-tools and a little extra help from the kernel. Given a kdump capture kernel loaded via kexec_load(), and a subsequent hotplug event, the crash hotplug handler finds the elfcorehdr and rewrites it to reflect the hotplug change. That is the desired outcome, however, at kernel panic time, the purgatory integrity check fails (because the elfcorehdr changed), and the capture kernel does not boot and no vmcore is generated. Therefore, the userspace kexec-tools/kexec must indicate to the kernel that the elfcorehdr can be modified (because the kexec excluded the elfcorehdr from the digest, and sized the elfcorehdr memory buffer appropriately). To facilitate hotplug support with kexec_load(): - a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is safe for the kernel to modify the kexec_load()'d elfcorehdr - the /sys/kernel/crash_elfcorehdr_size node communicates the preferred size of the elfcorehdr memory buffer - The sysfs crash_hotplug nodes (ie. /sys/devices/system/[cpu|memory]/crash_hotplug) dynamically take into account kexec_file_load() vs kexec_load() and KEXEC_UPDATE_ELFCOREHDR. This is critical so that the udev rule processing of crash_hotplug is all that is needed to determine if the userspace unload-then-load of the kdump image is to be skipped, or not. The proposed udev rule change looks like: # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" The table below indicates the behavior of kexec_load()'d kdump image updates (with the new udev crash_hotplug rule in place): Kernel |Kexec ---+-+ Old|Old |New | a | a ---+-+ New| a | b ---+-+ where kexec 'old' and 'new' delineate kexec-tools has the needed modifications for the crash hotplug feature, and kernel 'old' and 'new' delineate the kernel supports this crash hotplug feature. Behavior 'a' indicates the unload-then-reload of the entire kdump image. For the kexec 'old' column, the unload-then-reload occurs due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel (with 'new' kexec) does not present the crash_hotplug sysfs node, which leads to the unload-then-reload of the kdump image. Behavior 'b' indicates the desired optimized behavior of the kernel directly modifying the elfcorehdr and avoiding the unload-then-reload of the kdump image. If the udev rule is not updated with crash_hotplug node check, then no matter any combination of kernel or kexec is new or old, the kdump image continues to be unload-then-reload on hotplug changes. To fully support crash hotplug feature, there needs to be a rollout of kernel, kexec-tools and udev rule changes. However, the order of the rollout of these pieces does not matter; kexec_load()'d kdump images still function for hotplug as-is. Suggested-by: Hari Bathini Signed-off-by: Eric DeVolder Acked-by: Hari Bathini Acked-by: Baoquan He --- arch/x86/include/asm/kexec.h | 11 +++ arch/x86/kernel/crash.c | 27 +++ include/linux/kexec.h| 14 -- include/uapi/linux/kexec.h | 1 + kernel/Kconfig.kexec | 4 kernel/crash_core.c | 31 +++ kernel/kexec.c | 5 + kernel/ksysfs.c | 15 +++ 8 files changed, 102 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h index 9143100ea3ea..3be6a98751f0 100644 --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image); #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event #ifdef CONFIG_HOTPLUG_CPU -static inline int crash_hotplug_cpu_support(void) { return 1; } -#define crash_hotplug_cpu_support crash_hotplug_cpu_support +int arch_crash_hotplug_cpu_support(void); +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support #endif #ifdef CONFIG_MEMORY_HOTPLUG -static inline int crash_hotplug_memory_support(void) { return 1; } -#define crash_hotplug_memory_support crash_hotplug_memory_support +int arch_crash_hotplug_memory_support(void); +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support #endif + +unsigned int arch_crash_get_elfcorehdr_size(void); +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index c70a111c44fa..caf22bcb61af 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -427,6 +427,33 @@ int crash_load_segments(struct kimage *image) #undef pr_fmt #define pr_fmt(fmt) "crash hp: " fmt +/* These functions provide the value for the sysfs
Re: [RFC] IMA Log Snapshotting Design Proposal
On 8/10/23 07:12, Stefan Berger wrote: On 8/9/23 21:15, Tushar Sugandhi wrote: Thanks a lot Stefan for looking into this proposal, and providing your feedback. We really appreciate it. On 8/7/23 15:49, Stefan Berger wrote: On 8/1/23 17:21, James Bottomley wrote: On Tue, 2023-08-01 at 12:12 -0700, Sush Shringarputale wrote: [...] Truncating IMA log to reclaim memory is not feasible, since it makes the log go out of sync with the TPM PCR quote making remote attestation fail. This assumption isn't entirely true. It's perfectly possible to shard an IMA log using two TPM2_Quote's for the beginning and end PCR values to validate the shard. The IMA log could be truncated in the same way (replace the removed part of the log with a TPM2_Quote and AK, so the log still validates from the beginning quote to the end). If you use a TPM2_Quote mechanism to save the log, all you need to do is have the kernel generate the quote with an internal AK. You can keep a record of the quote and the AK at the beginning of the truncated kernel log. If the truncated entries are saved in a file shard it The truncation seems dangerous to me. Maybe not all the scenarios with an attestation client (client = reading logs and quoting) are possible then anymore, such as starting an attestation client only after truncation but a verifier must have witnessed the system's PCRs and log state before the truncation occurred. You are correct that truncation on it’s own is dangerous. It needs to be accompanied by (a) saving the IMA log data to disk as snapshots, (b) adding the necessary TPM PCR quotes to the current IMA log (as James mentioned above), (c) attestation clients having an ability to send the past snapshots to the remote-attestation-service (verifiers), (d) and verifiers having an ability to use the snapshots along with current IMA logs for the purpose of attestation. All these points are explained in the original RFC email in sections B.1 through B.5 [1]. I read it. Maybe you have dismissed the PCR update counter already... I am not sure what the PCR update counter is supposed to help with. It won't allow you to detect missing log events but rather will confuse anyone looking at it when my application extends PCR 12 for example, which also affects the update counter. It's a global counter that increases with every PCR extension (except PCR 16, 21, 22, 23) and if used as proposed would prevent any application from extending PCRs. https://github.com/stefanberger/libtpms/blob/master/src/tpm2/PCR.c#L667 https://github.com/stefanberger/libtpms/blob/master/src/tpm2/PCR.c#L629 https://github.com/stefanberger/libtpms/blob/master/src/tpm2/PCR.c#L161 Agree with your point about TPM PCR update counter Stefan. I will bring it up in the update counter patch series discussion [1]. [1] https://patchwork.kernel.org/project/linux-integrity/cover/20230801181917.8535-1-tusha...@linux.microsoft.com/ The shards should will need to be written into some sort of standard location or a config file needs to be defined, so that everyone knows where to find them and how they are named. We thought about well known standard location earlier. Letting the Kernel choose the name/location of the snapshot file comes with its own complexity. Our initial stance is we don’t want to handle that at Kernel level, and let the UM client choose the location/naming of the snapshot files. But we are happy to reconsider if the community requests it. I think an ima-buf (or similar) log entry in IMA log would have to appear at the beginning of the truncated log stating the value of all PCRs that IMA touched (typically only PCR 10 but it can be others). The needs to be done since the quote itself doesn't provide the state of the individual PCRs. This would at least allow an attestation client to re-read the log from the beginning (when it is re-start or started for the first time after the truncation). Agreed. See the description of snapshot_aggregate in Section B.5 in the original RFC email [1]. However, this alone (without the internal AK quoting the old state) could lead to abuse where I could create totally fake IMA logs stating the state of the PCRs at the beginning (so the verifier syncs its internal PCR state to this state). Yes, the PCR quotes sent to the verifier must be signed by the AK that is trusted by the verifier. That assumption is true regardless of IMA log snapshotting feature. Further, even with the AK-quote that you propose I may be able to create fake logs and trick a verifier into trusting the machine IFF it doesn't know what kernel this system was booted with that I may have hacked to provide a fake AK-quote that just happens to match the PCR state presented at the beginning of the log. If the Kernel is compromised, then all-bets are off. (Regardless of IMA log snapshotting feature.) => Can a truncated log be made safe for attestation when the attestation starts only after the
Re: [RFC] IMA Log Snapshotting Design Proposal
On 8/10/23 04:43, James Bottomley wrote: On Wed, 2023-08-09 at 21:43 -0700, Tushar Sugandhi wrote: On 8/8/23 14:41, James Bottomley wrote: On Tue, 2023-08-08 at 16:09 -0400, Stefan Berger wrote: [...] at this point doesn't seem necessary since one presumably can verify the log and PCR states at the end with the 'regular' quote. I don't understand this. A regular quote is a signature over PCR state by an AK. The point about saving the AK in the log for the original is that if the *kernel* truncates the log and saves it to a file, it needs to generate both the AK and the quote for the top of the file shard. That means the AK/EK binding is unverified, but can be verified by loading the AK and running the usual tests, which can only be done if you have the loadable AK, which is why you need it as part of the log saving proposal. I had this question about the usability of AK/EK in this context. Although AK/EK + PCR quote is needed to verify the snapshot shards / IMA logs are not tampered with, I am still not sure why AK/EK needs to be part of the shard/IMA log. The client sending AK/EK to attestation service separately would still serve the purpose, right? Well, the EK doesn't need to be part of the log: it's just a permanent part of the TPM identity. To verify the log, you need access to the TPM that was used to create it, so that's the point at which you get the EK. Agreed. EK is part of TPM identity. But to verify the log, you don’t need to have physical access to the TPM. You need to have access to just public part of EK and AK/AIK certs (TPM on the system would sign the quote using the private AK). I believe you already know this, just stating for the sake of completing the conversation. :) An AK is simply a TPM generated signing key (meaning the private part of the key is secured by the TPM and known to no-one else). In the literature a TPM generated signing key doesn't become an Attestation Key until it's been verified using an EK property (either a certify for a signing EK or a make/activate credential round trip for the more usual encryption EK. Yes. That aligns with my understanding of EK/AK in general. Thanks for describing. So the proposal is for each quote that's used to verify a log shard is that the TPM simply generate a random signing key and use that to sign I believe you are suggesting creating a new AK each time you want to sign a PCR quote. It is doable in TPM 2.0, and it provides benefits like privacy and untraceability. But it comes with it’s own costs – cost of generating new AK each time you want to sign, maintaining mapping of AK and it’s signed quotes, maintaining multiple public AK certs etc. the quote. You need to save the TPM form of the generated key so it can be loaded later and the reason for that is you can do the EK verification at any time after the quote was given by loading the saved key and running the verification protocol. In the normal attestation you do the EK verification of the AK *before* the quote, but there's no property of the quote that depends on this precedence provided you do the quote with a TPM generated signing key. Yes. The underlying point is that the usual way an EK verifies an AK requires a remote observer, which the kernel won't have, so the kernel Agreed. must do all its stuff locally (generate key, get quote) and then at I believe the Kernel doesn’t have to generate key while taking the snapshot. In the current proposal, Kernel can simply get the (unsigned) PCR quote and log it in IMA log as part of the snapshot_aggregate event. We don’t need to sign the quote while logging it in the IMA log as snapshot_aggregate. And the act of logging that event in IMA log extends the PCR bank. Sometime later, when a remote observer wants to validate the log – it can do it by comparing against the PCR quote that was signed at that point. some point later the system can become remote connected and prove to whatever external entity that the log shard is valid. So we have to have all the components necessary for that proof: the log shard, the quote and the TPM form of the AK. For instance, PCR quotes will be signed by AK. So as long as the verifier trusts the AK/EK, Right, but if you're sharding a log, the kernel doesn't know if a verifier has been in contact yet. The point of the protocol above is to make that not matter. The verifier can contact the system after the log has been saved and the verification will still work. The Kernel doesn’t need to know. And it still doesn’t matter. The benefit of our approach is the PCR values that represent the previous snapshot(shard) is now logged in the IMA log as snapshot_aggregate, and the PCRs are extended again as part of logging that event in IMA log. it can verify the quotes are not tampered with. Replaying IMA log/snapshot can produce the PCR quotes which can be matched with signed PCR quotes. If they match, then the verifier can conclude that the IMA log is
RE: kexec reports "Cannot get kernel _text symbol address" on arm64 platform
> -Original Message- > From: b...@redhat.com > Sent: Wednesday, August 9, 2023 7:42 AM > To: Pandey, Radhey Shyam ; > pi...@redhat.com > Cc: kexec@lists.infradead.org; linux-ker...@vger.kernel.org > Subject: Re: kexec reports "Cannot get kernel _text symbol address" on > arm64 platform > > On 08/08/23 at 07:17pm, Pandey, Radhey Shyam wrote: > > Hi, > > > > I am trying to bring up kdump on arm64 platform[1]. But I get "Cannot get > kernel _text symbol address". > > > > Is there some Dump-capture kernel config options that I am missing? > > > > FYI, copied below complete kexec debug log. > > > > [1]: https://www.xilinx.com/products/boards-and-kits/vck190.html > > Your description isn't clear. You saw the printing, then your kdump kernel > loading succeeded or not? > > If no, have you tried applying Pingfan's patchset and still saw the issue? > > [PATCHv7 0/5] arm64: zboot support > https://lore.kernel.org/all/20230803024152.11663-1-pi...@redhat.com/T/#u I was able to proceed further with loading with crash kernel on triggering system crash. echo c > /proc/sysrq-trigger But when I copy /proc/vmcore it throws memory abort. Also I see size of /proc/vmcore really huge (18446603353488633856). Any possible guess on what could be wrong? [ 80.733523] Starting crashdump kernel... [ 80.737435] Bye! [0.00] Booting Linux on physical CPU 0x01 [0x410fd083] [0.00] Linux version 6.5.0-rc4-ge28001fb4e07 (radheys@xhdradheys41) (aarch64-xilinx-linux-gcc.real (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0.20220819) #23 SMP Fri Aug 11 16:25:34 IST 2023 xilinx-vck190-20232:/run/media/mmcblk0p1# cat /proc/meminfo | head MemTotal:2092876 kB MemFree: 1219928 kB MemAvailable:1166004 kB Buffers: 32 kB Cached: 756952 kB SwapCached:0 kB Active: 1480 kB Inactive: 24164 kB Active(anon): 1452 kB Inactive(anon):24160 kB xilinx-vck190-20232:/run/media/mmcblk0p1# cp /proc/vmcore dump [ 975.284865] Unable to handle kernel level 3 address size fault at virtual address 80008d7cf000 [ 975.293871] Mem abort info: [ 975.296669] ESR = 0x9603 [ 975.300425] EC = 0x25: DABT (current EL), IL = 32 bits [ 975.305738] SET = 0, FnV = 0 [ 975.308788] EA = 0, S1PTW = 0 [ 975.311925] FSC = 0x03: level 3 address size fault [ 975.316888] Data abort info: [ 975.319763] ISV = 0, ISS = 0x0003, ISS2 = 0x [ 975.325245] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 975.330292] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 975.335599] swapper pgtable: 4k pages, 48-bit VAs, pgdp=05016ef6b000 [ 975.342297] [80008d7cf000] pgd=1501eddfe003, p4d=1501eddfe003, pud=1501eddfd003, pmd=15017b695003, pte=00687fff84000703 [ 975.354827] Internal error: Oops: 9603 [#4] SMP [ 975.360392] Modules linked in: 3 975. 63440] CBPrUo:a d0c aPID: 664 Comm: cp Tainted: G D 6.5.0-rc4-ge28001fb4e07 #23 [ 975.372822] Hardware name: Xilinx Versal vck190 Eval board revA (DT) [ 975.379165] pstate: a005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 975.386119] pc : __memcpy+0x110/0x230 [ 975.389783] lr : _copy_to_iter+0x3d8/0x4d0 [ 975.393874] sp : 80008dc939a0 [ 975.397178] x29: 80008dc939a0 x28: 05013c1bea30 x27: 1000 [ 975.404309] x26: 1000 x25: 1000 x24: 80008d7cf000 [ 975.411440] x23: 0400 x22: 80008dc93ba0 x21: 1000 [ 975.418570] x20: x19: 1000 x18: [ 975.425699] x17: x16: x15: 0140 [ 975.432829] x14: 8500a9919000 x13: 0041 x12: fffef6831000 [ 975.439958] x11: 80008d9cf000 x10: x9 : [ 975.447088] x8 : 80008d7d x7 : 0501addfd358 x6 : 0401 [ 975.454217] x5 : 0501370e9000 x4 : 80008d7d x3 : [ 975.461346] x2 : 1000 x1 : 80008d7cf000 x0 : 0501370e8000 [ 975.468476] Call trace: [ 975.470912] __memcpy+0x110/0x230 [ 975.474221] copy_oldmem_page+0x70/0xac [ 975.478050] read_from_oldmem.part.0+0x120/0x188 [ 975.482663] read_vmcore+0x14c/0x238 [ 975.486231] proc_reg_read_iter+0x84/0xd8 [ 975.490233] copy_splice_read+0x160/0x288 [ 975.494236] vfs_splice_read+0xac/0x10c [ 975.498063] splice_direct_to_actor+0xa4/0x26c [ 975.502498] do_splice_direct+0x90/0xdc [ 975.506325] do_sendfile+0x344/0x454 [ 975.509892] __arm64_sys_sendfile64+0x134/0x140 [ 975.514415] invoke_syscall+0x54/0x124 [ 975.518157] el0_svc_common.constprop.0+0xc4/0xe4 [ 975.522854] do_el0_svc+0x38/0x98 [ 975.526162] el0_svc+0x2c/0x84 [ 975.529211] el0t_64_sync_handler+0x100/0x12c [ 975.533562] el0t_64_sync+0x190/0x194 [ 975.537218] Code: cb01000e b4fffc2e eb0201df 540004a3 (a940342c) [ 975.543302] ---[ end trace ]--- t
Re: [RFC] IMA Log Snapshotting Design Proposal
Hi Sush, Tushar, On Tue, 2023-08-01 at 12:12 -0700, Sush Shringarputale wrote: > > | A. Problem Statement | > > Depending on the IMA policy, the IMA log can consume a lot of Kernel > memory on > the device. For instance, the events for the following IMA policy > entries may > need to be measured in certain scenarios, but they can also lead to a > verbose > IMA log when the device is running for a long period of time. > ┌───┐ > │# PROC_SUPER_MAGIC │ > │measure fsmagic=0x9fa0 │ > │# SYSFS_MAGIC │ > │measure fsmagic=0x62656572 │ > │# DEBUGFS_MAGIC│ > │measure fsmagic=0x64626720 │ > │# TMPFS_MAGIC │ > │measure fsmagic=0x01021994 │ > │# RAMFS_MAGIC │ > │measure fsmagic=0x858458f6 │ > │# SECURITYFS_MAGIC │ > │measure fsmagic=0x73636673 │ > │# OVERLAYFS_MAGIC │ > │measure fsmagic=0x794c7630 │ > │# log, audit or tmp files │ > │measure obj_type=var_log_t │ > │measure obj_type=auditd_log_t │ > │measure obj_type=tmp_t │ > └───┘ > > Secondly, certain devices are configured to take Kernel updates using Kexec > soft-boot. The IMA log from the previous Kernel gets carried over and the > Kernel memory consumption problem worsens when such devices undergo multiple > Kexec soft-boots over a long period of time. > > The above two scenarios can cause IMA log to grow and consume Kernel memory. > > In addition, a large IMA log can add pressure on the network bandwidth when > the attestation client sends it to remote-attestation-service. > > Truncating IMA log to reclaim memory is not feasible, since it makes the > log go > out of sync with the TPM PCR quote making remote attestation fail. > > A sophisticated solution is required which will help relieve the memory > pressure on the device and continue supporting remote attestation without > disruptions. If the problem is kernel memory, then using a single tmpfs file has already been proposed [1]. As entries are added to the measurement list, they are copied to the tmpfs file and removed from kernel memory. Userspace would still access the measurement list via the existing securityfs file. The IMA measurement list is a sequential file, allowing it to be read from an offset. How much or how little of the measuremnt list is read by the attestation client and sent to the attestation server is up to the attestation client/server. If the problem is not kernel memory, but memory pressure in general, then instead of a tmpfs file, the measurement list could similarly be copied to a single persistent file [1]. > > --- > > | B. Proposed Solution | > > In this document, we propose an enhancement to the IMA subsystem to improve > the long-running performance by snapshotting the IMA log, while still > providing mechanisms to verify its integrity using the PCR quotes. > > The remainder of the document describes details of the proposed solution > in the > following sub-sections. > - High-level Work-flow > - Snapshot Triggering Mechanism > - Design Choices for Storing Snapshots > - Attestation-Client and Remote-Attestation-Service Side Changes > - Example Walk-through > - Open Questions > --- > > | B.1 High-level Work-flow | > > Pre-requisites: > - IMA Integrity guarantees are maintained. > > The proposed high level work-flow of IMA log snapshotting is as follows: > - A user-mode process will trigger the snapshot by opening a file in SysFS >say /sys/kernel/security/ima/snapshot (referred to as > sysk_ima_snapshot_file >here onwards). Please fix the mailer so that it doesn't wrap sentences. Adding blank lines between bullets would improve readability. > - The Kernel will get the current TPM PCR values and PCR update counter [2] >and store them as template data in a new IMA event "snapshot_aggregate". >This event will be measured by IMA using critical data measurement >functionality [1]. Recording regular IMA events will be paused while >"snapshot_aggregate" is being computed using the existing IMA mutex lock. > - Once the "snapshot_aggregate" is computed and measured in IMA log, the > prior >IMA events will be made available in the sysk_ima_snapshot_file. > -
Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
On Thu, Aug 10, 2023 at 08:09:27AM +0800, Ming Lei wrote: > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since > num_possible_cpus becomes 1. > > 2) some archs do not support 'nr_cpus=1', and have to rely on > 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots > with single online cpu. That causes trouble because blk-mq limits single > queue. And we need to fix case 2. We need to drop the is_kdump support, and if they want to force less cpus they need to make nr_cpus=1 work. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/2] RISC-V: Use linux,usable-memory-range for crash kernel
Now we use "memeory::linux,usable-memory" to indicate the available memory for the crash kernel. While booting with UEFI, the crash kernel would use efi.memmap to re-populate memblock and then first kernel's memory would be corrputed. Consequently, the /proc/vmcore file failed to create in my local test. And according to "chosen" dtschema [1], the available memory for the crash kernel should be held via "chosen::linux,usable-memory-range" property which will re-cap memblock even after UEFI's re-population. [1]: https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/chosen.yaml Signed-off-by: Song Shuai --- kexec/arch/riscv/kexec-riscv.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kexec/arch/riscv/kexec-riscv.c b/kexec/arch/riscv/kexec-riscv.c index fe5dd2d..5aea035 100644 --- a/kexec/arch/riscv/kexec-riscv.c +++ b/kexec/arch/riscv/kexec-riscv.c @@ -79,20 +79,20 @@ int load_extra_segments(struct kexec_info *info, uint64_t kernel_base, } ret = dtb_add_range_property(>buf, >size, start, end, -"memory", "linux,usable-memory"); +"chosen", "linux,usable-memory-range"); if (ret) { - fprintf(stderr, "Couldn't add usable-memory to fdt\n"); + fprintf(stderr, "Couldn't add usable-memory-range to fdt\n"); return ret; } max_usable = end; } else { /* -* Make sure we remove elfcorehdr and usable-memory +* Make sure we remove elfcorehdr and usable-memory-range * when switching from crash kernel to a normal one. */ dtb_delete_property(fdt->buf, "chosen", "linux,elfcorehdr"); - dtb_delete_property(fdt->buf, "memory", "linux,usable-memory"); + dtb_delete_property(fdt->buf, "chosen", "linux,usable-memory-range"); } /* Do we need to include an initrd image ? */ -- 2.20.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/2] RISC-V: Fix the undeclared ‘EM_RISCV’ build failure
Use local `elf.h` instead of `linux/elf.h` to fix this build error: ``` kexec/arch/riscv/crashdump-riscv.c:17:13: error: ‘EM_RISCV’ undeclared here (not in a function); did you mean ‘EM_CRIS’? .machine = EM_RISCV, ^~~~ EM_CRIS ``` Signed-off-by: Song Shuai --- kexec/arch/riscv/crashdump-riscv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kexec/arch/riscv/crashdump-riscv.c b/kexec/arch/riscv/crashdump-riscv.c index 3ed4fe3..336d7a7 100644 --- a/kexec/arch/riscv/crashdump-riscv.c +++ b/kexec/arch/riscv/crashdump-riscv.c @@ -1,5 +1,5 @@ #include -#include +#include #include #include "kexec.h" -- 2.20.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
On 10/08/23 8:31 am, Baoquan He wrote: On 08/10/23 at 10:06am, Ming Lei wrote: On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote: On 08/10/23 at 08:09am, Ming Lei wrote: On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote: I'm starting to sound like a broken record, but we can't just do random is_kdump checks, and it's not going to get better by resending it again and again. If kdump kernels limit the number of possible CPUs, it needs to reflected in cpu_possible_map and we need to use that information. Can you look at previous kdump/arch guys' comment about kdump usage & num_possible_cpus? https://lore.kernel.org/linux-block/caf+s44ruqswbosy9kmdx35crviqnxoeuvgnsue75bb0y2jg...@mail.gmail.com/ https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/ The point is that kdump kernels does not limit the number of possible CPUs. 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since num_possible_cpus becomes 1. Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus=" limits the possible cpu numbers, while "maxcpuss=" only limits the cpu number which can be brought up during bootup. We noticed this diference because a large number of possible cpus will cost more memory in kdump kernel. e.g percpu initialization, even though kdump kernel have set "maxcpus=1". Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64 dev and maintainers do not care about it. Finally the patches are not accepted, and the work is not continued. Now, I am wondering what is the barrier to add "nr_cpus=" to power ach. Can we reconsider adding 'nr_cpus=' to power arch since real issue occurred in kdump kernel? If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed. As for this patchset, it can be accpeted so that no failure in kdump kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion. IMO 'nr_cpus=' support should be preferred, given it is annoying to maintain two kinds of implementation for kdump kernel from driver viewpoint. I guess kdump things can be simplified too with supporting 'nr_cpus=' only. Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so that power people decided to not support it. Though "nr_cpus=1" is an ideal solution, maintainer was not happy with the patch as the code changes have impact for regular boot path and it is likely to cause breakages. So, even if "nr_cpus=1" support for ppc64 is revived, the change is going to take time to be accepted upstream. Also, I see is_kdump_kernel() being used irrespective of "nr_cpus=1" support for other optimizations in the driver for the special dump capture environment kdump is. If there is no other downside for driver code, to use is_kdump_kernel(), other than the maintainability aspect, I think the above changes are worth considering. Thanks Hari ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[ANNOUNCE] kexec-tools v2.0.27 preparation
Hi all, I am planning to release kexec-tools v2.0.27 in the next two weeks to roughly coincide with the release of the v6.5 kernel. I would like to ask interested parties to send any patches they would like included in v2.0.27 within one week so they can be considered for inclusion in an rc release. For reference the patches queued up since v2.0.26 are as follows. Thanks to everyone who has contributed to kexec-tools! f67c4146d7b5 arm64: Hook up the ZBOOT support as vmlinuz fc7b83bdf734 arm64: Add ZBOOT PE containing compressed image support f41c4182b0c4 kexec/zboot: Add arch independent zboot support 1572b91da7c4 kexec: Introduce a member kernel_fd in kexec_info 714fa11590fe kexec/arm64: Simplify the code for zImage a8de94e5f033 LoongArch: kdump: Set up kernel image segment 4203eaccfa92 kexec: __NR_kexec_file_load is set to undefined on LoongArch 63e9a012112e ppc64: Add elf-ppc64 file types/options and an arch specific flag to man page 806711fca9e9 x86: add devicetree support 29fe5067ed07 kexec: make -a the default e63fefd4fc35 ppc64: add --reuse-cmdline parameter support 8fc55927f700 kexec-tools 2.0.26.git ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCHv7 0/5] arm64: zboot support
On Thu, Aug 03, 2023 at 10:41:47AM +0800, Pingfan Liu wrote: > From: root > > As more complicated capsule kernel format occurs like zboot, where the > compressed kernel is stored as a payload. The straight forward > decompression can not meet the demand. > > As the first step, on aarch64, reading in the kernel file in a probe > method and decide how to unfold the content by the method itself. > > This series consists of two parts > [1/5], simplify the current aarch64 image probe > [2-5/5], return the kernel fd by the image load interface, and let the > handling of zboot image built on it. (Thanks for Dave Young, who > contributes the original idea and the code) > > > To ease the review, a branch is also available at > https://github.com/pfliu/kexec-tools.git > branch zbootV7 > > To: kexec@lists.infradead.org > Cc: Dave Young > Cc: ho...@verge.net.au > Cc: a...@kernel.org > Cc: jeremy.lin...@arm.com Thanks everyone, applied. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec