Re: [PATCH v17 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug

2024-02-29 Thread Sourabh Jain

Hello Baoquan,

On 29/02/24 19:21, Baoquan He wrote:

Hi Sourabh,

On 02/26/24 at 02:11pm, Sourabh Jain wrote:

Commit 247262756121 ("crash: add generic infrastructure for crash
hotplug support") added a generic infrastructure that allows
architectures to selectively update the kdump image component during CPU
or memory add/remove events within the kernel itself.

This patch series adds crash hotplug handler for PowerPC and enable
support to update the kdump image on CPU/Memory add/remove events.

Among the 5 patches in this series, the first two patches make changes
to the generic crash hotplug handler to assist PowerPC in adding support
for this feature. The last three patches add support for this feature.

The whole series looks good to me. I have acked patch 1 and 2. Leave
those three ppc patches to ppc expert to review and approve. Thanks a
lot for your great work.
Thanks for your feedback. I will soon send v18 to fix the two mirror 
document issues
and will look forward to PPC maintainers to provide feedback on the rest 
of the series.


Appreciate your support!

- Sourabh


Re: [PATCH v17 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug

2024-02-29 Thread Baoquan He
Hi Sourabh,

On 02/26/24 at 02:11pm, Sourabh Jain wrote:
> Commit 247262756121 ("crash: add generic infrastructure for crash
> hotplug support") added a generic infrastructure that allows
> architectures to selectively update the kdump image component during CPU
> or memory add/remove events within the kernel itself.
> 
> This patch series adds crash hotplug handler for PowerPC and enable
> support to update the kdump image on CPU/Memory add/remove events.
> 
> Among the 5 patches in this series, the first two patches make changes
> to the generic crash hotplug handler to assist PowerPC in adding support
> for this feature. The last three patches add support for this feature.

The whole series looks good to me. I have acked patch 1 and 2. Leave
those three ppc patches to ppc expert to review and approve. Thanks a
lot for your great work.

Thanks
Baoquan



[PATCH v17 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug

2024-02-26 Thread Sourabh Jain
Commit 247262756121 ("crash: add generic infrastructure for crash
hotplug support") added a generic infrastructure that allows
architectures to selectively update the kdump image component during CPU
or memory add/remove events within the kernel itself.

This patch series adds crash hotplug handler for PowerPC and enable
support to update the kdump image on CPU/Memory add/remove events.

Among the 5 patches in this series, the first two patches make changes
to the generic crash hotplug handler to assist PowerPC in adding support
for this feature. The last three patches add support for this feature.

The following section outlines the problem addressed by this patch
series, along with the current solution, its shortcomings, and the
proposed resolution.

Problem:

Due to CPU/Memory hotplug or online/offline events the elfcorehdr
(which describes the CPUs and memory of the crashed kernel) and FDT
(Flattened Device Tree) of kdump image becomes outdated. Consequently,
attempting dump collection with an outdated elfcorehdr or FDT can lead
to failed or inaccurate dump collection.

Going forward CPU hotplug or online/offline events are referred as
CPU/Memory add/remove events.

Existing solution and its shortcoming:
==
The current solution to address the above issue involves monitoring the
CPU/memory add/remove events in userspace using udev rules and whenever
there are changes in CPU and memory resources, the entire kdump image
is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
CPU/Memory add/remove events, reloading the entire kdump image is
inefficient. More importantly, kdump remains inactive for a substantial
amount of time until the kdump reload completes.

Proposed solution:
==
Instead of initiating a full kdump image reload from userspace on
CPU/Memory hotplug and online/offline events, the proposed solution aims
to update only the necessary kdump image component within the kernel
itself.

Git tree for testing:
=
https://github.com/sourabhjains/linux/tree/kdump-in-kernel-crash-update-v17

Above tree is rebased on top of linux-next and the below patch series:
https://lore.kernel.org/all/20240213113150.1148276-1-hbath...@linux.ibm.com/

To realize this feature, the kdump udev rule must be updated. On RHEL,
add the following two lines at the top of the
"/usr/lib/udev/rules.d/98-kexec.rules" file.

SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

With the above change to the kdump udev rule, kdump reload is avoided
during CPU/Memory add/remove events if this feature is enabled in the
kernel.

Note: only kexec_file_load syscall will work. For kexec_load minor changes
are required in kexec tool.

Changelog:
--
v17:
  - Rebase the patch series on top linux-next tree and below patch series
https://lore.kernel.org/all/20240213113150.1148276-1-hbath...@linux.ibm.com/
  - Split 0003 patch from v16 into two patches
   1. Move get_crash_memory_ranges() along with other *_memory_ranges()
  functions to ranges.c and make them public.
   2. Make update_cpus_node function public and take this function
  out of file_load_64.c
  - Keep arch_crash_hotplug_support in crash.c instead of core_64.c [05/06]
  - Use CONFIG_CRASH_MAX_MEMORY_RANGES to find extra elfcorehdr size [06/06]

v16: 
[https://lore.kernel.org/all/20240217081452.164571-1-sourabhj...@linux.ibm.com/]
  - Remove the unused #define `crash_hotplug_cpu_support`
and `crash_hotplug_memory_support` in `arch/x86/include/asm/kexec.h`.
  - Document why two kexec flag bits are used in
`arch_crash_hotplug_memory_support` (x86).
  - Use a switch case to handle different hotplug operations
in `arch_crash_handle_hotplug_event` for PowerPC.
  - Fix a typo in 4/5.

v15:
  - Remove the patch that adds a new kexec flag for FDT update.
  - Introduce a generic kexec flag bit to share hotplug support
intent between the kexec tool and the kernel for the kexec_load
syscall. (2/5)
  - Introduce an architecture-specific handler to process the kexec
flag for crash hotplug support. (2/5)
  - Rename the @update_elfcorehdr member of the struct kimage to
@hotplug_support. (2/5)
  - Use a common function to advertise hotplug support for both CPU
and Memory. (2/5)

v14:
  - Fix build warnings by including necessary header files
  - Rebase to v6.7-rc5

v13:
  - Fix a build warning, take ranges.c out of CONFIG_KEXEC_FILE
  - Rebase to v6.7-rc4

v12:
  - A patch to add new kexec flags to support this feature on kexec_load
system call
  - Change in the way this feature is advertise to userspace for both
kexec_load syscall
  - Rebase to v6.6-rc7

v11:
  - Rebase to v6.4-rc6
  - The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been
removed. The